<%BANNER%>

Using a Crop-Pest Ontology to Facilitate Image Retrieval


PAGE 1

USING A CROP-PEST ONTOLOGY TO FACILITATE IMAGE RETRIEVAL By SOONHO KIM A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005

PAGE 2

Copyright 2005 by Soonho Kim

PAGE 3

To my father and mother.

PAGE 4

iv ACKNOWLEDGMENTS My first thanks go to my adviser, Dr. Howard W. Beck. His kindness, enthusiasm, and support made all the difference in my academic career. I had no background knowledge about agricultural information before I met him. He has been teaching several basic components from scratch that are needed for my research such as Java, concepts of object-oriented database management system, artificial intelligence, and ontologies. Every Friday for 5 five years, Dr. Beck m eets me to discuss my research and gives advising to me. I could not finish my dissertation without him. My thanks also go to Dr. James J. Jones, who served on my committee. I learned how to build a biological syst em from his course. He taught me a participatory design to develop an application for agricultural user s. Whenever I implemented applications on my research, I kept in mind of consider ing users and communicating with users. I thank Dr. Fedro Zazueta, who served on my committee. He made my eye to open roles of information technology in agricultural extension serv ice. He introduced current information technology on agricultural extension se rvice in U. S. In addition, he made me to learn how to improve extension service using information technology on South Korea in terms of contributi on of my research. I thank Dr. Tim Momol, who served on my committee. He gave me a big help to understand concepts about crops and pests. He introduced concepts on plant pathology. It was basis on developing a crop-pest ontology which is a major component of my research.

PAGE 5

v I thank Dr. Joachime Hammer, who served on my committee. He provided guidance on basic concepts of database management system and semantic web. I also wish to thank the other members of my academic community at University of Florida, particularly Dr. Dani el W. Lee, and Mary Hall. I am deeply grateful to my friends (Mcnair Bostic, Rohit Badal, Ye onchul Jeong, and Chris Davison) for their support during a difficult period for our field. Outside my academic community, I shared many tears and laughs over with Angela Brammer who corrected my grammar on this di ssertation. She did her best for helping me finish it. I thank Inok Kim who takes care of my baby. She cares for my daughter as like her third child. I know I am very lu cky to meet her in Gainesville. The dissertation was a long process, and much has changed in this time. Bonny Koo, my daughter, was born duri ng this period, and she is a sweet hope for the future. Whenever I felt that I was not enough smart to write a dissertation, she made me to keep trying to do. I really thank my husband, Jawoo Koo. He is my best friend, my friend, and my partner. My mother-in-low, Kyungsun Wh ang, and my father-in-low, Chiwhe Koo, encouraged me to do my Ph.D. degree and gave me lots of help to finish this dissertation. Finally, I thank my father, Jongko Kim, a nd my mother, Bokrae Kim. They put all efforts to take care of me through their lives Unless they did not, I could not become a Ph.D. I thank my brothers, Jaeho Kim and Kwa ngho Kim as well. I really thank God who makes me to be happy every moment.

PAGE 6

vi TABLE OF CONTENTS page ACKNOWLEDGMENTS.................................................................................................iv LIST OF TABLES.............................................................................................................ix LIST OF FIGURES.............................................................................................................x ABSTRACT......................................................................................................................x ii CHAPTER 1 INTRODUCTION........................................................................................................1 Statement of Problems..................................................................................................1 Limitation of Finding Relevant Images.................................................................2 Limitation of Helping User s to Find Proper Keywords........................................4 Image Retrieval Using a Thesaurus..............................................................................5 An Approach to Image Retrieval Using an Ontology..................................................6 Related Works in the Agriculture Field........................................................................9 Contributions..............................................................................................................10 Overview of Chapters.................................................................................................10 2 THE CROP-PEST ONTOLOGY...............................................................................12 Introduction.................................................................................................................12 Terminology...............................................................................................................12 OWL....................................................................................................................13 Component of the Crop-Pest Ontology...............................................................13 Methodology for Building the Crop-Pest Ontology...................................................14 Purpose and Domain of the Crop-Pest Ontology................................................16 Consideration of Reuse of Existing Ontologies..................................................17 Enumeration of Important Terms in the Crop-Pest Ontology.............................18 Building Classes and the Class Hierarchy...........................................................19 Defining Properties..............................................................................................21 Creating Individuals............................................................................................22 The Crop-Pest Ontology.............................................................................................24 Validation of the Crop-Pest Ontology........................................................................24 Evaluation of the Crop-Pest Ontology........................................................................24 Conclusion and Discussion.........................................................................................27

PAGE 7

vii 3 A PRACTICAL COMPARISON BETWEEN THESAURUS AND ONTOLOGY TECHNIQUES AS A BASIS FOR SEARCH IMPROVEMENT.............................29 Introduction.................................................................................................................29 The National Agricultural Library Thesaurus............................................................30 Relationships between Terms..............................................................................31 Comparative Analysis.................................................................................................32 Representing Domain Knowledge.......................................................................33 Concepts, Semantic Relationships, and Their Logical Consistency............33 The Ability to Represent Complicated Concepts.........................................36 Reasoning Based on Representation...................................................................39 Reasoning Facilities.....................................................................................39 Searching Documents...................................................................................41 Automatic Validation of Logical Consistency.............................................42 Conclusion..................................................................................................................44 4 BUILDING A DATABASE AND GRAP HICAL USER INTERFACE FOR BROWSING IMAGES BASED ON THE CROP-PEST ONTOLOGY....................46 Introduction.................................................................................................................46 Ontology-Based Image Indexing................................................................................47 Creating Concepts for Indexing Imag es in the Crop-Pest Ontology...................48 The Indexing Process..........................................................................................49 Step 1: Syntactic and Semantic Analysis of Each Image Caption...............49 Step 2: Creating an Individual of th e Image in the Crop-Pest Ontology.....50 Step 3: Filling in the Values of Properties for Each Individual...................50 Step 4: Connecting Assigned Values into Classes/Individuals in the croppest ontology.....................................................................................50 Step 5: Saving the Individual into the Crop-Pest Ontology.........................51 An Interface for Browsing Images.............................................................................51 Goals....................................................................................................................51 Features to be Supported.....................................................................................52 The Graphical Interface.......................................................................................52 Usability Study...........................................................................................................54 The Keyword-Based Search Interface.................................................................55 Hypotheses to Be Tested.....................................................................................55 Design and Procedure..........................................................................................55 Results.................................................................................................................56 Conclusion..................................................................................................................57 5 ONTOLOGY-ASSISTED IN FORMATION EXTRACTION...................................67 Introduction.................................................................................................................67 Components in Information Extraction System.........................................................68 Ontology..............................................................................................................69 Phrase Patterns.....................................................................................................69 Island Chart Parser..............................................................................................71

PAGE 8

viii Semantic Structure...............................................................................................72 Results of the Information Extraction System............................................................73 Conclusion..................................................................................................................75 6 CONCLUSION AND FUTURE DIRECTIONS........................................................76 APPENDIX A 291 IMAGE CAPTIONS............................................................................................80 B A LIST OF WORDS APPEAR ING ON 291 IMAGE CAPTIONS...........................87 C 138 TERMS USING THE EVALUATION OF THE CROP PEST ONTOLOGY AND THE AGROVOC ONTOLOGY.......................................................................90 D THE CROP-PEST ONTOLOGY WRITTEN IN OWL.............................................92 E QUESTIONAIRE ON EVALUATION...................................................................145 REFERENCES................................................................................................................148 BIOGRAPHICAL SKETCH...........................................................................................153

PAGE 9

ix LIST OF TABLES Table page 79H1-1 Mean Precision and Relative Reca ll of search engines during 2004........................211H3 80H2-1 The coverage of 138 tested terms in the crop-pest and AGROVOC ontologies.....212H27 81H4-1 The result of the question “How many empty results were found when using this interface?”...............................................................................................................213H60 82H4-2 Participant’s responses about the question “Did this interface help you learn more about crops, pests, and relationships between them?”...................................214H60

PAGE 10

x LIST OF FIGURES Figure page 83H1-1 An illustration of precision and recall. Pr ecision is expressed as the percentage of retrieved documents that are relevant...................................................................215H284H2-1 Image captions in the image collection...................................................................216H1685H2-2 Concepts from SUMO that compose the upper level of the crop-pest ontology....217H1886H2-3 The ten most frequently appear ing terms in the 291 image captions......................218H1987H2-4 Class hierarchy from the root class “thing” down to the class “insect” and its subclass “southern green stink bug,” s howing top-level, middle-level and bottom-level classes................................................................................................219H2088H2-5 The object property “has_developmental_ stage_of” and the datatype property “number_of_legs” in the class “insect.”..................................................................220H2289H2-6 An individual of class “soybean.” All properties are filled with each value...........221H2390H2-7 Results of the consistency check of the crop-pest ontology using the Pellet OWL reasoner...................................................................................................................222H2491H3-1 Description of a term plants in NALT (“…” denotes additional terms not shown)223H3492H3-2 Description of a concept Plant from the crop-pest ontology, written in Web Ontology Language (OWL)....................................................................................224H3593H3-3 Three terms, corn earworm peanut and larva in the NALT thesaurus.................225H3794H3-4 Properties assigned a class corn earworm for describing the concept “large corn earworm larva on peanut leaf”................................................................................226H3895H3-5 The OWL abstract form for the indi vidual of a class “corn earworm” to represent a complicated concept “large corn earworm larva on peanut leaf”.........227H3996H3-6 The OWL abstract form for an infere nce “corn earworm is a peanut pest”............228H4097H3-7 Screen shot of the preferre d term “plant pests” in NALT.......................................229H4298H3-8 Screen shots of OWL consistency checker and the re sults in Pellet.......................230H43

PAGE 11

xi 99H4-1 Hierarchy from the root cla ss “thing” to “dig ital photograph.”..............................231H60100H4-2 Overview of indexing images asso ciated with the crop-pest ontology...................232H61101H4-3 Syntactical and semantic analysis of the image caption “damage caused by three cornered alfalfa hopper on soybean.”......................................................................233H61102H4-4 The interface that provide a facility to browse 291 images....................................234H62103H4-5 Expanding a node (orange re ctangle) shows an image...........................................235H62104H4-6 A facility for showing properties. A prope rty is shown by an edge with a label. The thick end of each edge represents the domain..................................................236H63105H4-7 The facility to show the selected imag es with related concep ts in the crop-pest ontology...................................................................................................................237H64106H4-8 The facility to show all images relate d with a particular class in the crop-pest ontology...................................................................................................................238H65107H4-9 The screenshot of keyw ord-based search interface implemented with Egothor.....239H66108H5-1 The information extraction system in the crop-pest domain...................................240H69109H5-2 The hierarchical struct ure of phrase patterns for a concept “plant part”.................241H70110H5-3 Components of the island chart pa rser with a simple input string..........................242H73111H5-4 A parse tree (left) and the semantic st ructure (right) based on the parse tree for the caption “cotton stainer adul t in a white cotton bloom.”....................................243H74

PAGE 12

xii Abstract of Dissertation Pres ented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy USING A CROP-PEST ONTOLOGY TO FACILITATE IMAGE RETRIEVAL By Soonho Kim December 2005 Chair: Howard W. Beck Major Department: Agricultur al and Biological Engineering Professionals in the agricultural field, su ch as growers, Extension agents, and researchers, need a facility to organize and locate photograp hic images related to their work, especially as the volume of such imag es continues to increase. However, current keyword-based image retrieval suffers from relatively low precisi on and recall. A new approach to image retrieval using an ontol ogy in the agricultural field addresses the limitation of supporting users to find proper im ages in keyword-based image retrieval, by browsing images associated with formal desc riptions of the meanings of words and the relationships between them. Two hundred and ninety-one images were used to develop the approach in the particular domain incl uding crops and relate d pests. A “crop-pest ontology” was created to re present concepts describi ng the images. The ontology contains crops and related pests, relationshi ps between them, and environmental factors affecting them. A practical comparison betw een the crop-pest ontology and the existing National Agricultural Library Thesaurus (NA LT) was done to compare and contrast the similarities and differences between th e thesaurus and an ontology. The comparison

PAGE 13

xiii shows that the crop-pest ontology has better formal representation capabilities avoiding ambiguity as well as supporting inferences wh ich are not possible in a thesaurus such as NALT. To enable browsing of images associ ated with the crop-pest ontology, images were indexed based on the ontology. The inde xing process included manual syntactic and semantic analyses of each image caption, but such an analysis has a high labor cost. Therefore, a process of semi-automatic an alysis was designed using natural languagebased information extraction techniques whic h include a parser, a grammar described by phrase patterns, and the crop-pest ontology. A graphical interface was implemented for browsing images associated with concepts in the crop-pest onto logy. A usability study indicates that participants met less empty resu lts in the retrieval of images using the croppest ontology. Moreover, it shows that the im age retrieval using th e crop-pest ontology helps users to find relevant images by tran sferring the domain knowledge to them. The indexing process included manual syntactic a nd semantic analyses of each image caption, but such an analysis has a high labor cost. Therefore, a process of semi-automatic analysis was designed using natural language -based information extraction techniques which include a parser, a grammar descri bed by phrase patterns, and the crop-pest ontology. This research 1) shows the developm ent of the crop-pest ontology, 2) analyzes the differences and similarity between an ont ology and a thesaurus, 3) develops a method of automatic information extraction were e xplored as a way to reduce the manual labor required for ontology-based inde xing and 4) develops a new approach using the ontology to index and browse images so that profe ssionals can retrieve images more easily and accurately in the agricultural field.

PAGE 14

1 CHAPTER 1 INTRODUCTION Images are a major component of agricultu ral information systems. As the number of available images increases rapidly, finding relevant images in a timely and efficient manner becomes more difficult. Agricultural pr ofessionals such as growers, Extension agents, and researchers need a facility to retr ieve images in a collection more easily in the agricultural domain. There are two standard ap proaches to image retr ieval: content-based and text-based. In content-base d image retrieval, images are searched using features such as color, texture, shape and spatial location. An example of this approach is the PicSOM system (Koskela, 2000). In text-b ased retrieval, searches are based on textual descriptions such as image captions. Since content-based retrieval is still not suitable for most applications, online image-re trieval engines such as G oogle (Google, 2005) employ textbased image retrieval. Statement of Problems A typical method of text-based image re trieval employs the use of keywords. Keyword-based image retrieval is an approach that retrieves text such as image captions or descriptions of images by using indexes of words appearing in the text. In its simplest form, a search engine indexes every word occu rring in every piece of text associated with images in a collection to be searched. Users describe their interest s through one or more keywords. If the keywords appear in indexes, the search engine s hows images containing those keywords. This keyword-based imag e retrieval approach has two general limitations (Hyvonen et al., 2003).

PAGE 15

2 Limitation of finding relevant images : Appearance of a keyword in a text does not necessarily mean that the text is releva nt to the user’s interest. Relevant text may not necessarily contain the e xplicit keyword typed by the user. Limitation of helping us ers to find proper keywords : the user does not necessarily know what keyword to type to find images. The keyword-based approach is not useful unless the user is fa miliar with what kinds of images are in a collection and the user know what terms ar e used to describe relevant images. Limitation of Findi ng Relevant Images The limitation of finding relevant images is formally measured in terms of recall and precision, illustrated in Figure 1-1. Recall is the ratio of the number of relevant documents retrieved to the total number of relevant documents in the collection. Precision is the ratio of the number of relevant documents retrieved to the total number of documents retrieved. Recall and precision ar e usually expressed as percentages. Figure 1-1. An illustration of precision and recall. Precision is expressed as the percentage of retrieved documents that are relevant. Recall is expressed as the percentage of relevant documents retrie ved out of all the relevant documents in the collection.

PAGE 16

3 The most desirable retrieval approach woul d be one with high precision and recall. Such a retrieval approach would find all, and on ly, the images that are relevant to a user’s interest. Blair reported that a practical evaluation of a publication system containing roughly 350,000 pages of text showed the averag e recall value of the retrieval system to be less than 20 percent of the te xt relevant to a particular re trieval (Blair et al., 1985). The precision values ranged from 19.6 percent to 100 percent. Bl air stated that the recall values were low because keyword-based retrie val is difficult to use when pieces of text are retrieved by subject. Shafi reported that the precision and reca ll of three search engines, AltaVista (AltaVista, 2005), Googl e (Google, 2005) and HotBot (HotBot, 2005), were less than 30 %, as shown in the Table 1-1. The search engines were evaluated taking the first ten results pertaining to scholarly information for estimation of precision and recall. Table 1-1. Mean Precision and Relative Recall of search engines during 2004 AltaVista GoogleHotBot Precision 27% 29% 28% Recall 18% 20% 29% The design of this approach is based on th e false assumption that it is a simple matter for users to predict the exact words and phrases that appear in the texts they would find most useful (and only in those texts). This assumption comes from the basic but flawed idea that one can use th e “statistical aspects” of wo rds such as the occurrence, location, and frequency of word s to predict their meanings comprehensively. Therefore, one way of getting higher valu es of precision and recall would be to take the meanings of words into consideration. Understanding the characteristics of words themselves can help the underlying retrieval method adap t to the meanings of words: Words can have several meanings. For example, the word “beetle” can mean an insect belonging to a large order character ized by a modified outer pair of wings that forms a hard covering for the inner pair (Encarta, 2005). The word “beetle” can also refer to a car manufactur ed by Volkswagen that has a shape reminiscent of the insect.

PAGE 17

4 Different words can have the same meaning. For example, the word “worm” can mean an elongated soft-bodied insect. The word “larval” can mean a wingless and elongated soft-bodied insect that is imma ture hatching from egg. The different words, “worm” and “larva” poi nt at the same meaning. Words can have a wide variety of differ ent associations. For example, the word “beetle” has an association w ith the word “soybean” because a beetle is a pest of soybeans. In addition, the word “beetle” is associated with the words “egg,” “larva,” “pupa,” and “adult,” which are all names of developmental stages in a beetle’s life. Limitation of Helping Users to Find Proper Keywords The limitation of helping users to find proper keywords can be addressed by providing a facility of browsing images associ ated with well-structured knowledge in a particular field. Yee reported that current keyword-based retrieval such as Google Image Search (Google Image Search, 2005) did not a llow users to browse images (Yee, 2003). Markkula reported that professiona ls in artistic fields such as journalism, design, and art direction use browsing as a basic strategy in searching for images. The reason is that some words describing selected images may be difficult to express freely as search keywords but are easily applied when the im ages are seen. Similarly, growers and other professionals in the agricultural field would want to be able to browse images as well as search for them with keywords. A facility to show relationships of an image collection to users can address the limitation of supporting users in the keywor d-based image retrieval with providing describing a particular knowledge inferred by images. The result set generated from the keyword-based retrieval would perhaps miss in teresting aspect of an image collection; the images in the collection are related to each other in many relationships. For example, an image presenting “stink bug damage on cotton leaf” can be retrieved as a result set of the keyword “stink bug”. However, the inte resting relationships between “cotton leaf”

PAGE 18

5 and “damage” can not be shown to the users, even though these relationships might give a clue for users to find relevant images. Image Retrieval Using a Thesaurus A thesaurus is a list of terms related to a particular subject and describes related terms for each item. Its primary purposes are indexing documents and helping users retrieve information more easily. It is orga nized in a hierarchical structure, based on interrelationships of the terms: Broader Term (BT): A particular term is more general than another term. Narrower Term (NT): A particular term is more specific than another. Related Term (RT): Two terms are associated. Use For (UF): A particular term is the preferred term among a set of synonymous terms. One of main contributions of a thesaurus in image retrieval can be reformulation of users’ requests and expansion of them to a ddress low precision and recall in the keywordbased image retrieval. A thesaurus could be used to retrieve more relevant images by expanding user’s requests with related term s (RT), which might result in increasing recall. In addition, a thesaurus could be used to avoid retrieving non-relevant information by using narrower term (NT). The approach of retrieving images using a thesaurus has been developed in order to a ddress the low precision and r ecall of keyword-based image retrieval by considering the in terrelationships between term s. Dalmau reported that the integration of thesaurus relationships into search and browse in an online photograph collection significantly improved the user’s discovery experience (Dalmau et al., 2005). When users’ requests were found in the th esaurus, the result page provided search suggestions based on broader term (BT) or narrower term (NT) so that users can broaden

PAGE 19

6 or refine a result set. In a ddition, the search performed retrie val of all narr ower terms of user’s requests if they are matched into the thesaurus (Dalmau, 2005). However, Hersh assessed the expansion of users’ requests using thesaurus relationships for improving search performance (Hersh et al., 2000). A test collection was e xpanded using synonym, BT, NT, and RT in the Unified Medical Language System (UMLS) Metathesaurus (UMLS Metathesaurus, 2005). Hersh reported that thesaurus-based query expansion causes a decline in retrieva l performance generally. A thesaurus can be used as a tool to browse images. Dalmau insisted that a thesaurus directed users to more access points available for each image. In addition, he argued that the use of a thesaurus to brow se images provides disambiguation. However, Chun stressed that a thesaurus has a limited number of rela tions, which can result in relatively meager expressiveness of repres enting specific knowledge, which could result in ambiguity of relations (Chun 2004). An Approach to Image Retrieval Using an Ontology A new approach to image retrieval using an ontology is introduced to deal with the two limitation of 1) finding relevant images a nd 2) helping users to find proper keywords in the keyword-based image retrieval. The word ontology is originally from the field of philosophy and referred to the subject of existence.00000FFFFF1 Computer scientists have eventually come to use this term to support the shari ng and reuse of formally represented knowledge in computer systems (Gruber 1993). An ontology is defined as a collection of concepts and their relationships which describes knowle dge in a particular domain. An ontology is described in a formal way that makes concepts understandable to a machine. A concept is 1 http://www-ksl.stanford.edu/k st/what-is-an-ontology.html

PAGE 20

7 a set of things that we receive in the world. A concept has a set of property that must be true of each member of the set denoted by the concept A concept can be represented by one of three formal elements in an ontology: an indivi dual, class, or property: Individual : An individual is defined as a real object in the world. Classes and subclasses : A class defines a set of e numerated individuals that belong together according to their common properties. Any class can be a subclass of another class that is, whenever satisf ying the necessary and sufficient conditions of another. The subclasses are satisfying the requirements of their superclass and adding additional restrictions. Supercla sses are generalization of the common properties of the subclasses. Properties : Properties are defined as relationshi ps between individuals or between individuals and data values (s uch as strings and integers). Domain : A domain of a property is defined as a set of individuals to which the property is applied. Range : A range of a property is defined as a set of individuals that the property has as its value. Concepts in the crop-pest domain, which in clude crops and related pests, can be described in a crop-pest ontol ogy. This ontology includes concep ts such as plants, pests, relationships between plants and insects su ch as damage, and environmental concepts such as soil. The concept “plant” is descri bed using the class “plant.” The particular concept “damages” can be represented as a property between “ins ect” and “plant.” Classes, properties, and indi viduals are described formal machine-readable form (what does that mean). The word ontology has also become popularly associ ated with one idea for the next generation of the Web, called the Semantic Web (Semantic Web, 2005). The Semantic Web purports to be a universal medium for information exchange by supplying meaning in such a way as to be machine-processa ble to the content of documents on the Web. Currently, the Web is based on documents wr itten in HTML, a language that describes a

PAGE 21

8 body of structured text, focusing on a desire d visual layout. However, HTML has the limitation that it does not de scribe information contai ned within the documents themselves. For example, with HTML we can present a page that lists pesticides; accordingly, the HTML code of this page can make simple, document-level statements such as "This document's title is ‘Pesticides.’ But there is no facility within HTML itself to relay more complex concepts, such as "AZOXYSTROBIN” is a pesticide with a unit cost of $1.38." Rather, HTML can only say that the text of "AZOXYSTROBIN" is something that should be positioned near th e words "Pesticides" and "$1.38." HTML can not indicate that AZOXYSTROBIN is a type of pesticide or even assert that $1.38 is a unit price. The Semantic Web addresses this limited ability of HT ML found within the current Web by using ontologies and by exte nsion of current Web markup languages, all of which will play key roles in describing richer semantics of Web documents by providing sources of shared, precisely defined terms. The rich semantics of ontologies can pr ovide better retrieval and indexing of images. TextPresso1F2 is a biological publication system that uses ontologies to catalog and retrieve literature. In the TextPresso system using an ontology resulted in a threefold increase of search efficiency in the specifi c field of biological ge ne-to-gene interaction (Mller 2004). Hyvonen developed an ontology-based image retrieval system for 600 photographs in the Helsinki University Museum (Hyvonen, 2003). The promotion ontology was used to annotate images and provi ded a facility to image retrieval. He stressed that image retrieval using the promo tion ontology helped the us er to find relevant images, even though the user initially lacked knowledge about the domain 2 http://www.textpresso.org

PAGE 22

9 Related Works in the Agriculture Field Existing image repositories in agricultur e employ either the keyword-based image retrieval or a browsing tool using a few leve ls of categories, or both. Plant Diagnostic Information System (PDIS, 2005) and Digital Diagnostic and Information System (DDIS, 2005) provides an image search based on keyw ord-based image search. Therefore, the image search in PDIS and DDIS could have low recall and precision. In both systems, each image has a searchable text that desc ribes image annotations, circumstances and other relevant information. Unless users are fa miliar with the text, it could be difficult to find proper keywords to retrieve images. Ag ricultural Research Service provides an image gallery contains more than 2000 imag es (ARS, 2005). It provi des a keyword-based image search. Therefore it has the same li mitations of conventional keyword-based search. In addition, it provides nine simple categories to help users to find relevant images: animals, crops, education, field research fruits & veggies, illustrations, insects, lab research, and plants. Each category shows all images that are classified into the category and there is no subcategory. The ni ne categories are too general to describe whole contents of images. That means users ca n miss some contents of images that might be important to them. Insect Images (Insect Images, 2005) provides keyword-image retrieval for insect images a nd support categories to browse images associated with the insect scientific names. However, Insect Im ages still has a problem with low recall and precision, since it employs the conve ntional keyword-based search. Other image repositories in agriculture pr ovide only browsing facilities images associated with categories. Texas Agricult ural Extension Servi ce provides browsing for cotton insect images using a few levels of categories. Those categories are not enough to

PAGE 23

10 represent the content of images. User need a tool to represent contents of images more precisely. Contributions The main contributions of this di ssertation are summarized as follows: A new technology (“ontology”) —was adap ted to an agricultural information system to address the image retrieval problem. An ontology describing crops and related pe sts, called the crop-pest ontology, was built in the formal Web Ontology Langua ge, OWL. A practical methodology for building the ontology was developed. Based on the crop-pest ontology, manual extr action of image information was done with image captions from a scientist who is working on crops and insects as a first step toward image indexing. Based on the previous results, images asso ciated with the crop-pest ontology were indexed to enable browsing of the images. A new graphical interface was created for browsing images indexed with the croppest ontology. Methods of automatic information extracti on were explored as a way to reduce the manual labor required for ontology-based indexing. Overview of Chapters Chapter 2 presents the development of th e crop-pest ontology, which covers crops, pests, the relations between them, and the environmental factors surrounding them. Based on early methodologies of ontology building, a practical methodology is introduced for the agricultural field. In a ddition, validation and evaluati on of the created ontology are discussed. Chapter 3 presents the differences betw een the crop-pest ontology and another similar approach: “thesaurus.” Jacob in troduced the argument that a controlled vocabulary is itself an ontology, so long as the standard concept of a controlled vocabulary is similarly redefined (Jacob, 2004) This chapter discusses a comparative

PAGE 24

11 analysis between the crop-pest ontology and the well-known agri cultural thesaurus National Agricultural Library Thesaurus (NALT) in order to verify Jacob’s argument. The analysis was done according to the repres entational and inferen tial abilities that lend more power to information retrieval. The result of this comparative analysis is reported. In addition, the result is discussed in term s of addressing the li mitations of keywordbased image retrieval Chapter 4 presents the process of indexing each image with the crop-pest ontology. This process was based on syntactic and se mantic analysis. A graphical interface for browsing images with the croppest ontology is introduced. Th e preliminary evaluation is represented. Chapter 5 presents the process of info rmation extraction, which aided in the indexing of 150 images that were associated with the crop-pest ontology. Information extraction is a process that identifies useful information from natural language text regarding a domain and converts that inform ation to a structured form, which can be saved into a database. The process uses a pa rser to map words appearing in each image caption to concepts in the crop-pest ontol ogy. This mapping process helps build indexes of images. Chapter 6 summarizes conclusions an d identifies future directions.

PAGE 25

12 CHAPTER 2 THE CROP-PEST ONTOLOGY Introduction An ontology represents domain knowledge using concepts and relationships expressed in a formal, machine-processa ble language. Building a domain-specific ontology is a process of capturing domain know ledge using this formal language. First defining the purpose and intended uses of an ontology is a crucial step. The crop-pest ontology was built to facilitate image retrieval in an image collection taken by a scientist who is working on crops and pests in the Univ ersity of Florida. The collection contains 291 images that shows three crops (soybean, p eanut, and cotton) and related insects that cause damage on them. The scope of the croppest ontology covers at least the domain knowledge contained by the image collection. This chapter will introduce a methodology for developing the crop-pest ontology and describe it using specific examples in each step. In addition, the created crop-pest ontology will be shown. Then the validati on and evaluation of the ontology will be described. Terminology Before the procedure of building the ont ology can be discussed, components of the crop-pest ontology and some terminology used during the developmen t of the crop-pest ontology must be introduced.

PAGE 26

13 OWL The OWL is an acronym for Web Ontology Language a semantic markup language for publishing and sharing data us ing ontologies on the web. OWL has three sublanguages: OWL Lite, OWL DL, and OWL Full. OWL provides machine-processable information on the Web. OWL provides the formality of the crop-pest ontology. Component of the Crop-Pest Ontology The crop-pest ontology consists of classe s, subclasses, proper ties, subproperties, domains, ranges and individuals in a hierarchical structure. Individual: An individual is defined as things we perceive in the world. For example, when one sees a green plant in a field, the specific green plant observed can be assigned as an individual. Classes and subclasses : A class defines a set of e numerated individuals that belong together according to their common properties. For example, a class organism can be defined as any indivi dual that has six common properties: movement, feeding, respiration, growth, reproduction and sens itivity to stimuli. Here, these six specific prope rties are called necessary c onditions to be a member of an organism, which keeps this ontology logically consistent. In other words, a virus is not an organism, because a vi rus cannot reproduce itself without a host. Classes are organized in a hierarchical st ructure using subclasses. Any class can be a subclass of another class that, whenev er satisfying the necessary conditions of another. For example, a class Plant coul d be defined as a subclass of the class Organism. From this statement, we can dedu ce that if an individual is a plant, then it is also an organism. The subclasses are satisfying the requirements of their superclass and adding additional restrictions. Superclasses are generalization of the common properties of the subclasses. All cl asses in the crop-p est ontology are the subclass of Thing, considered the singular root of the crop-pest ontology itself. Properties and subproperties : Properties are defined as relationships between individuals or between indivi duals and data values (such as strings and integers). For example, properties of the class Orga nism can include has parts, locate in, cause damage to, has color, and has age. Properties are divided into two categories: one is a property related to a member of a certain class, and the other is a property related to a data type. For example, the pr operty cause damage to is related to an organism of the class Organism that can be damaged. Likewise, the property has age is related to a integer value such as te n. As with the overall hi erarchy of classes, a property can be a subproperty of one or more other properties. For example, the property cause damage to can have a subpr operty cause feeding damage to. We can likewise conclude that if a member of a class is related to another by the property

PAGE 27

14 cause feeding damage to, then it is also related to the other by the property causing damage to. The individuals and data t ypes participating in a property can be restricted, using domain and range. Domain: A domain of a property is define d as a set of individuals to which the property is applied. For example, let us assume the property cause damage to covers the domain of pest. Thus, if A can cause damage to B, then A must be an pest. Range: A range of a property is defined as a set of individuals that the property has as its value. For example, the property cau se damage to may be assigned the range of plant. Based on deduction, we can reach the conclusion that if A causes damage to B, then B must be a plant. Methodology for Building the Crop-Pest Ontology Noy pointed out some fundamental rules in ontology development. First, the best method involves focusing on the intended applic ation. Second, to build an ontology, it is best to redefine the ontology by using it in applications and by discussing it with experts in the field, after defining an initial ve rsion of an ontology. Th ird, concepts in the ontology are physically or logically close to real objects, such as physical objects or logical objects and their relationships. For ex ample, nouns are likely to be objects and verbs are likely to be relationships in se ntences that describe domain knowledge in a domain (Noy and McGuinness, 2001). Kalyanpur suggested an outline of developing an ontology using a casual Web ontology developmen t process (Kalyanpur et al., 2004). He emphasized the following process: Ontology developers start with certain domain information they want to model, and based on that information, they derive a loose terminology of concepts and relationships in the domain. The concepts are structured into a hierarchy and associated with their properties. The ontology is refined by brow sing and searching concepts. Uschold proposed a skeletal methodology fo r building ontologies in more detail (Uschold and Gruninger, 1996). Hi s methodology was as follows:

PAGE 28

15 Identification of purpose and scope. Ontology capture: finding the key concep ts and relationships in the domain; description of precise unambiguous text definitions for such concepts and relationships; and identification of terms to refer to? such concepts and relationships. Ontology coding: explicit representation of the the ontology in some formal language. Integration of existing ontologies. Evaluation of the ontology. Documentation of the ontology. Noy outlined a guide to create an ont ology as well (Noy and McGuinness, 2001). Her guidelines were similar to Uschold’s, but they focused on more practical aspects, using the example of “wine and food.” Her si mple guidelines for developing an ontology were the following Define classes in the ontology. Arrange the classes in a taxonomic hierarchy. Define properties and describe al lowed values for these slots. Fill in the values for properties for individuals. In general, the crop-pest ontology was built according to Noy’s methods. The methodology for developing the crop-pest ontology is as following: 1. Development of the purpose and domain of the crop-pest ontology 2. Consideration of Reuse of Existing Ontologies 3. Enumeration of Important Term s in the Crop-Pest Ontology 4. Building Classes and the Class Hierarchy 5. Definition of the Properties of Classes 6. Creating Individuals

PAGE 29

16 Purpose and Domain of the Crop-Pest Ontology The starting step of the development of the crop-pest ontology was to define its purpose and domain. The purpose of the crop-pest ontology is to use a tool to browse and search 291 images associated with their cap tions. To fulfill this purpose, the crop-pest ontology needed to cover all concepts in th e image captions. Figure 2-1 shows some of those captions. Thrips damage to peanut leaves. Closeup of adult thrips on peanut leaf. Rednecked peanutworm and damage on peanut. Rednecked peanutorm in peanut bud. Hopperburn caused by leafhoppers on peanut leaves. Closeup of hopperburn caused by leafhoppers on peanut leaf. Overview of hopperburn on peanut caused by leafhoppers. Adventitious root growth on peanut cau sed by three-cornered alfalfa girdling. Lesser cornstalk borer silken feeding tubes on peanut pegs. Lesser cornstalk borer adult moths (male left, female right). Closeup of lesser cornstalk borer larva on peanut leaf. Whitefringed beetle grub in so il at base of peanut plant. Spotted cucumber beetle (Southern Co rn Rootworm adult) on peanut leaf. Southern corn rootworm (Spotted cucu mber beetle larva) on peanut peg. Sugarcane beetle on finger. Cutworm on soil curled in C-shape Figure 2-1. Image captions in the image collection As shown in Figure 2-1, the content of th e image captions includes insects, plants, relationships between them such as damage, and environmental elements such as soil. The domain that this ontology covers is com posed of crops, pests, and the relationships between them, as well as the environmenta l elements surrounding them. Crops (such as soybeans, peanuts, and cotton) are defined as a collection of plants grown by farmers for food or other uses. These crops also offer food or shelter for various developmental stages of insects. The term “pest” refers to any insect that da mages a crop by introducing disease or physical and physiological ch anges. The crop-pest ontology, therefore, supports both the external st ructure of the crops and inse cts as well as the internal

PAGE 30

17 processes or events resulting from associati ons between them. In a ddition, it reflects the nature of crop-pest relationships and the environmental elements surrounding crops and pests. In addition, the ontology in cludes concepts that are not directly related to crop and pests. For example, in Figure 2-1, the capti on “Sugarcane beetle on finger” contains the concept “finger.” The purpose of the ontology is to support image retrieval. When a user searches for images, often a s cale of size is required for comparison, and this example a human body part such as a finger is used for scale. Consideration of Reuse of Existing Ontologies One advantage to using ontologies is the possi bility of reusing existing ontologies when it is possible to refine and extend existing ontologies built in the same domain for a particular purpose. In the agricultural field, ontologies su itable for the purpose and the crop-pest domain do not yet exist. There is a National Agricultur al Library Thesaurus (NALT) called “NALT” that covers agricu ltural fields including the crop-pest domain. The NALT can be a reference to build the cr op-pest ontology. However, the ambiguity of relationships on NALT did not directly allow re using it to build the cr op-pest ontology. Some general concepts such as “abstract ” and “physical” in the upper level of the crop-pest ontology were imported from the Suggested Upper Merged Ontology (SUMO). SUMO is an upper-level ontology that provides definitions for general-purpose concepts and acts as a foundation for more specific domain ontologies (Niles and Pease, 2001). Figure 2-2 shows some concepts from SUMO that comprise the uppe r level of the croppest ontology. Most domain-specific concepts such as “stink bug” and “peanut,” were created.

PAGE 31

18 Abstract attribute o internal attribute o relational attribute quantity o content-based quantity o physical quantity physical object o agent o collection o self connected object process o biological process o pathological process Figure 2-2. Concepts from SUMO that co mpose the upper level of the crop-pest ontology. Enumeration of Important Terms in the Crop-Pest Ontology It is useful to list important terms on the domain, because it helps the ontology developers to group terms manually. Image captions are a good source of terms because those captions are directly related to the crop-pest domain and of course they are designed specifically to descri be the content of the image. 291 image captions were tokenized and counted on the frequency of each term. 257 terms were listed according to the frequency with which they appeared in the image captions. Ten of the most frequent terms are shown in Figure 2-3. The others ar e shown in Appendix B. However, these 257 terms are not all of the terms in the domai n. Terms not appearing on this list but needed for this domain were added during the devel opment of the ontology. For example, a term “insect” was not contained in the list. Ho wever, the term “insect” was needed to

PAGE 32

19 categorize concepts such as “stink bug”, “beet le”, and “armyworm”, so the term “insect” was included into the crop-pest ontology. On 217(a)Cotton 124 Soybean 82 Leaf 69 Photograph 68 Larva 59 Damage 57 Peanut 47 Of 43 Boll 31 Figure 2-3. The ten most frequently appeari ng terms in the 291 image captions. Others are shown in Appendix B. (a) indicates th e frequency of the word “on” in the 291 image captions. Building Classes and the Class Hierarchy Noy introduced these three approaches in developing a class hierarchy (Noy and McGuinness, 2001): A top-down development process, which be gins with the definition of the most general concepts in the domain and pro ceeds to subsequent specification of the concepts. A bottom-up development process, which begins with the definition of the most specific concepts and continues with the subsequent grouping of these concepts into more general concepts. A combination development process, whic h starts with a few notable top-level concepts and a few salie nt specific concepts. The crop-pest ontology was developed base d on the combination approach. First, the developing process started with a few notab le top-level classes. The root of the croppest ontology became the class “thing,” which is a standard rule of building any ontology, according to the standard Web Ontology Langua ge (OWL, 2005). This class “thing” is the most general concept in the ontology. All other classes are subclasses of the class “thing”. Classes imported from SUMO were us ed as the top-level classes immediately

PAGE 33

20 below the class “thing” (See the figure 22). The reason for im porting these SUMO classes as top-level classes is these top-level classes from SUMO provided a foundation for more specialized classes. For example, a specific concept “numbe r” was a subclass of class “thing” according to the OWL standar d. Then, SUMO provided two subclasses of class “thing”: One is a class “abstract” and the other is a cl ass “physical”. The two classes are disjointed, which means there is no indivi dual of class “abstract” that become an individual of class “physical”. The class “ physical” represents a thing has a location in space and time. Since the concept “number” is not located in space and time, the concept can be assigned as a subclass of the class “abstract”. The process of determining which classes belongs to the class “number” was continued through top-level classes to find correct location of the class. The middle-level classes in the crop-pest ontology are plants, re lated insects, and environmental objects such as soil. For example, the class “insect” became a middle-level class in the ontology. All upper s ubclasses of the class “insect” are shown in Figure 2-4. Figure 2-4. Class hierarchy from the root class “thing” down to the class “insect” and its subclass “southern green stink bug,” s howing top-level, middle-level and bottom-level classes. Classes in blue were imported from SUMO and classes in purple were created to desc ribe the bottom level classes.

PAGE 34

21 The most specific classes were created us ing the bottom-up approach to the middle classes. For example, one kind of insect, a co ncept “the southern gr een stink bug” (Figure 2-4), appearing on an image caption “southe rn green stink bug on cotton leaf” was assigned to a subclass of the “insect” cla ss. This bottom-up approach makes assigning subclasses easier because concepts appear ing on image captions are clear enough to find which classes belongs to. Defining Properties Building a hierarchical structure of cl asses is not enough to create an ontology because the hierarchical structure itself cannot fully describe any concept. The complete description is accomplished by assigning propert ies. Properties can be one of two types: object properties or datatype properties, as shown earlier. An obj ect property describes the relationship between two indivi duals. For example, the property “has_developemtal_stage_of” can show the relationship between an individual of the class “insect” and an individual of one of cl asses “adult”, “pupa”, “larva”, or “egg”, since an insect has a developmental stage. In this property, the class “i nsect” becomes a domain and one of classes “adult”, “pupa”, “larv a”, or “egg” becomes a range. A datatype property describes the relationshi p between an individual and a data type, such as string or integer. For example, the datatype prope rty “the_number_of_legs” describes the fact that an insect has six legs by assign ing a value of “6” as an integer.

PAGE 35

22 Figure 2-5. The object property “has_developmental_stage_of” and the datatype property “number_of_legs” in th e class “insect.” One of the most difficult decisions to make while developing the crop-pest ontology was determining the lowest level of granularity in the re presentation in the ontology. Noy gave a guideline for this decision that depends on the potential applications of a particular ontology (Noy and McGuinness, 2001). In other words, the level of granularity is determined by what th e most specific concepts are that will be represented in the ontology fo r a given application. During th e development of the croppest ontology, the decision of the level of granularity was based on the application—to browse 291 images. Therefore, the crop-pest ontology needed to cont ain at least all the concepts shown in the image captions. The most specific classes in the ontology represent concepts appearing in the image captions. Creating Individuals The last step of building the crop-pest ontology was crea ting individuals of classes in the hierarchy. Defining an individual of a class involves the following steps: Choosing a class Creating an individu al of the class Filling out properties of the individual

PAGE 36

23 For example, the individual “soybean_20” was created to represent a specific soybean. The class had the following properties defined: “locate_place” (Boolean) “locate_in_time” (Boolean) “is_host_of” “has_parts” “is_damaged_by” All properties are filled with each value, as shown in Figure 2-6. The datatype property “locate_place” is fill ed with a true boolean va lue, and the object property “is_damaged_by” is filled with the i ndividual “yellow_stripe_armyworm_16.” Figure 2-6. An individual of class “soybean.” All properties are filled with each value.

PAGE 37

24 Based on the above methodology, 615 classes a nd individuals were created in the crop-pest domain. The Crop-Pest Ontology The crop-pest ontology contai ns 286 classes, 81 object properties, 36 datatype properties, and 305 individuals The crop-pest ontology is written in OWL, shown in Appendix D. Validation of the Crop-Pest Ontology Validating the crop-pest ontology meant de tecting unsatisfiab le concepts in conjunction with an OWL reasoner and reporti ng errors. Unsatisfiable concepts are concepts that cannot be true of any possibl e individual. Those con cepts are usually the result of a basic logic error during ontology development, as they cannot be used to describe any individual. Unsatis fiable concepts are also easy for a reasoner to detect and display (Parsia et al., 2005) Pellet is an open-source Java-based OWL DL reasoner (Pellet OWL reasoner, 2005). Pellet allows utilities to see ve rsions of OWL such as OWL full or DL to check ontology consistency, to cl assify the taxonomy, and so on. This Pellet OWL reasoner was used to check the consis tency of the crop-pest ontology. Figure 2-7 shows the result of the consistenc y check of the crop-pest ontology. Figure 2-7. Results of the c onsistency check of the croppest ontology using the Pellet OWL reasoner. Evaluation of the Crop-Pest Ontology Complete ontologies not only can support th eir intended applications and function properly but also can be re-used for the deve lopment of other ontologies. Therefore, the

PAGE 38

25 evaluation of ontologies is e ssential. There are two appro aches to ontology evaluation: qualitative and quantitative (Brewster et al., 2004). The qualitative approach would be take n by an ontology developer with knowledge in a particular domain. He/she would be asked to evaluate an ontology using the perspective of the principles. For example, Gomez claimed th at the lack of methods for evaluating ontologies could be an obstacle to their use in several application domains (Gomez, 1999). He suggested some ideas to eval uate ontologies techni cally, especially in the definitions of classes in the ontology. The evaluation of the defini tions of classes in the ontology is a technical evaluation that must be performed during the whole ontology development step. The purpose of this evalua tion is to discover deficiencies of defined classes, individuals and prope rties. First, the structure of the ontology should be checked, using the criteria that the definitions s hould have clear, necessary, and sufficient conditions and should be written in formal language. In addition, the definitions should be logically consistent (Gr uber, 1993). Second, the syntax of the definitions should be checked for syntactically incorrect structure, such as l oops between definitions. Third, the content of the definitions should be checked to detect what the ontology defines, does not define, or defines incorrectly; what can be, ca nnot be, or may be inferred; or what may be inferred incorrectly. Finally, he showed th ree case studies for the evaluation of an ontology: the evaluation of defi nitions, hierarchy, and prope rties. Kohler reported an evaluation of existing ontologies in the mo lecular biological data source (Kohler and Schulze-Kremer, 2002). He checked the stability of the concepts, the validity of the hierarchy, and the wide usage technically.

PAGE 39

26 The quantitative approach is a data-drive n approach to ontology evaluation that tests whether the ontology contains domain -specific corpus, which is a collection of domain-related terms. Brewster chose the art and artists domain for which he had developed the ARTEQUAKT ontology and then co llected 41 arbitrary texts from the Internet on a numbers of artists. He compared the ARTEQUAKT ontology with the SUMO ontology and the Ontology of Science (Ontology of Science, 2005), even though the SUMO ontology and the Ontology of Scie nce did not cover the same domain as ARTEQUAKT. At the time, he was unable to find any ontology covering the same domain because ontology devel opment research was then in its early stages. He tested them how many corpus to determine how appr opriate it is for the representation of the knowledge of the domain represented by the corpus. The crop-pest ontology evaluation was done using the quantitative approach, testing the coverage of the ont ology with a domain-specific co rpus. The first step was to find an ontology that covered the same domain. The AGROVOC ontology has a high coverage of the agricultural domain ( AGROVOC ontology, 2005), including the croppest domain. The crop-pest ontology was compared with the AGROVOC ontology. The second step of this evaluation was the collection of arbitrary domain-specific corpus. As mentioned before, most concepts in th e crop-pest ontology were from DDIS image captions that described crops and related pest s. Therefore, similar image captions in the same domain would have been good candida tes for the domain-specific corpus. Two restrictions were applied to find image captions to test: the captions could not have been used for the development of the crop-pest on tology, and the test capti ons were all related to cotton and its pests. Texas Agricultural Extension Service provided 95 image captions

PAGE 40

27 that contained a domain-specific corpus (Tex as Agricultural Extension Service, 2005). Fifty of those 95 captions were selected ra ndomly, and 138 terms appearing in those 50 captions were used to test the crop-pest ont ology, as shown in Appendix C. The coverage of the 138 terms is shown in Table 2-1. Th e coverage of the crop-pest ontology was higher than AGROVOCs covera ge. This result showed that the crop-pest ontology was the closer fit with the selected domain-specifi c corpus. This indicates how appropriate the crop-pest ontology is for the representation of the knowledge of th e domain represented by selected texts. Table 2-1. The coverage of 138 tested term s in the crop-pest and AGROVOC ontologies. The crop-pest ontology AGROVOC Coverage (percentage) 44.93% 30.43% Conclusion and Discussion The methodology to develop the crop-pest ontology is based on three fundamental rules: 1 ) the best method involves focusing on the intended applicati on 2) it is necessary step to redefine the ontology by using it in applications and/or by discussing it with experts in the field, after defi ning an initial version of an ontology and 3) concepts in the ontology are physically or logically close to real objects, such as physical objects or logical objects and their relationships. Th e procedure for deve loping the crop-pest ontology was as follows: Determination of the purpose and do main of the crop-pest ontology Consideration of the reus e of existing ontologies Enumeration of important terms in the crop-pest ontology Determination of the classe s and the class hierarchy Definition of properties Creation of individuals

PAGE 41

28 According to the procedure, the crop-pest ontology was developed. It contains 286 classes, 81 object properties, 36 datatype properties, 305 individuals. The crop-pest ontology is written in OWL. The validation of the crop-pest ontology was essential to find unsatisfiable concepts in the ontology. The Pellet OWL reasoner was used to validate the crop-pest ontology. The consistency of the crop-pe st ontology was revealed as true. Evaluation as well as validation of the crop-pest ontology was impor tant because in addition to supporting their original applications by functio ning properly, complete ontologi es also can be re-used for the development of other ontologies. The data -driven approach for evaluation of the croppest ontology showed that th e crop-pest ontology covers a domain-specific corpus well when compared with AGROVOC, a well-know n ontology that covers all agricultural domains. However, the data-driven approach has some limitations compared with the qualitative approach, which checks the definiti on of classes, propert ies, and individuals by ontology developers. The data-d riven approach cannot check the logical correctness of the definition of classes, pr operties, and individuals. An OWL reasoner could provide logical correctness, only if the reasoner supports to reason all classes, properties, and individuals. For now, the Pellet OWL reas oner could support consistency checks and some query processes. An evaluation could be designed according to the purpose of the crop-pest ontology. The purpose of the crop-pe st ontology is browsing 291 images in a collection. If users agree that browsing images associated w ith the crop-pest ontology is more convenient than conventional ways of browsing images, which could be another evaluation of the crop-pe st ontology as well.

PAGE 42

29 CHAPTER 3 A PRACTICAL COMPARISON BETWEEN THESAURUS AND ONTOLOGY TECHNIQUES AS A BASIS FOR SEARCH IMPROVEMENT Introduction Jacob introduced the claim that a controlled vocabulary22222FFFFF3 is itself an ontology, so long as the standard concept of a controlled vocabulary ar e similarly redefined (Jacob 2003). Chun pointed out that both thesauri a nd ontologies have co mmon traits: describing domain-specific knowledge; containing term s (or concepts) and relations among those terms; making use of hierarchical structures; being used in information management applications to catalog and retrieve inform ation; and needing to be maintained and revised constantly. Yet he also stressed that they are not th e same, for a thesaurus has a limited number of relations, which can result in relatively meager expressiveness of representing specific knowledge (Chun 2004). Another difference is the two systems having different points of emphasis. Wher eas ontology builders are primarily concerned with how software and associated machines in teract with ontologies in a logical way, thesaurus developers (such as librarians) in stead focus on how users retrieve information solely with the aid of a thesaurus (Jibbaja bba 2002). The limited number of relations and the different points of emphasis do not cove r all differences between thesauri and ontologies, most notably omitting differences based on the characteristics of languages describing domain knowledge. In this paper, we explore additional differences between thesauri and ontologies, not onl y describing the differences th emselves but also providing 3 A controlled vocabulary is the same as a thesaurus.

PAGE 43

30 specific examples that explicitly reveal just how thesauri are not ontologies. We have selected the National Agricultural Library Thesaurus (NALT) as our specific thesaurus of study, because it covers agricultural domains and has performed well in the past as a controlled vocabulary in several well-known ag ricultural information systems. Likewise, we have developed an ontology which covers cr ops and related insects, hereafter referred to as the “crop-pest ontology,” as a pr actical domain-specific ontology. We then performed a practical comparison betw een NALT and the crop-pest ontology. NALT will be described further. The cr op-pest ontology was introduced in the chapter 2. The mechanics of the practical co mparison and offer results of the comparison process will be introduced. The conclusi on of the comparison will be shown. The respective abilities of NALT and the crop-pest ontology associated with respects to agricultural information sy stems will be discussed. The National Agricultural Library Thesaurus The National Agricultural Library Thesau rus (NALT) is intended for indexing materials and for aiding retrieval in agricultural information systems. The thesaurus was prepared by staff of the Nati onal Agricultural Library (NAL) to meet the needs of the United States Department of Agriculture (US DA) and the Agricultural Research Service (ARS) for an agricultu ral thesaurus (NALT 2005). NALT is the controlled vocabulary of NAL's bibliographic database of citati ons to agricultural resources, known as AGRICOLA. The Food Safety Research Information Office (FSRIO) and the Agricultural Network Information Center (AgNIC) use NALT as the controlled vocabulary of their information system. NALT is also used for browsing within the ARS and AgNIC web sites (NALT 2005).

PAGE 44

31 NALT is structured into 17 subject categor ies. These categories are derived from the NAL Agricultural Classifica tion Prototype, originally deve loped for the Agricultural Information Network.33333FFFFF4 The subject scope of agriculture is broadly defined in NALT and includes terminology related to the supporting bi ological, physical a nd social sciences. Relationships between Terms NALT includes hierarchical, equivalen ce and associative relationships among concepts. Hierarchical relationships are in dicated by Broader Term and Narrower Term designations in the thesaurus. This hierar chical structure of relationships is a distinguishing feature of the thesaurus, in contrast to a simple list of alphabetically ordered terms. Broader terms represent more general concepts than narrower terms: Crop yield Broader Terms: crop production, yields Narrower Terms: grain yield, yield components "Grain yield" is subordinate to "crop yield" since it is a more specific type of crop yield. Similarly, "crop yield" belongs to a larger concep t class of "yields." This relationship suggests that if a se archer is interested in "crop yield" then they would also be interested in specific types of crop yields such as "grain yield" and "yield components." Equivalence relationships are designated by Use and Used for crossreferences. Equivalence is made when two or more terms represent the same (or nearly the same) concept, e.g. synonymous terms, common names of organisms and their scientific names, spelling variants, usage variants, and acronyms: Mechanical damage Use for: mechanical injuries 4 http://www.agnic.org

PAGE 45

32 As shown here, "mechanical injuries" is a synonymous term for "mechanical damage." The reciprocal rela tionships appear as follows: Beetles Use: coleoptera Here, "coleoptera" is the scientific name or preferred term, wh ile "beetles" serves as a common name to help direct users to a more appropriate term for indexing and retrieval purposes. In general, NALT directs users from non-preferred terms to the more appropriate descriptors for indexing and retrieval using its "Use" and "Used for" designations. Associative rela tionships are designated by Related Terms reciprocal relationships. An associative relationship is made between terms that are conceptually related but are neither hierarch ical nor equivalence relations hips in nature. Associative relationships serve to alert inde xers and searchers that there are other related concepts in the thesaurus that may be of interest to them: Insects Related Terms: insecticides Here, "insecticides" is a related concept to "insects," because insecticides are chemical substances used to kill insects. Comparative Analysis In this section, we make a comparativ e analysis between NALT and the crop-pest ontology to further analyze their differences The main points of comparison are the representation of domain knowledge and faculty of reasoning based on the data representation itself. The representation of do main knowledge is a crucial feature for both systems, because each should extract do main knowledge and describe it using components such as concepts, terms, relati ons, or properties. We compare NALT and the

PAGE 46

33 crop-pest ontology by showing ways to re present knowledge and then testing their abilities to describe knowledge of doma in concepts. Then, we examine reasoning facilities within each of these technologies. R easoning is the use of logical expressions by agents such as humans or machines to find results or draw conclusions (Encarta 2005). Whenever agents can reach and process well -structured knowledge, the agents alone can make logical inferences and deduce conclu sions based on the exis ting well-structured knowledge. We illustrate specific applications of reasoning such as ontology validation and search. NALT covers all of agriculture including cr ops and related insect s. But because the ontology only covers crops and associated pe sts, comparative analysis was performed using only crops and associated pests, calle d the “crop-pest domain.” 631 concepts were generated from the crop-pest ont ology database for the comparative analysis. Some of 41,000 preferred terms (terms for indexing and searching) and related terms (terms which have relationships with those preferred te rms) covering the crop-pest domain from NALT were examined during this comparison. Representing Domain Knowledge Concepts, Semantic Relationships, and Their Logical Consistency We explored how to best represent doma in knowledge using concepts and semantic relationships in both NALT and the crop-pest ontology. We selected a particular concept Plant which is the basic concept in our selected domain of focus (the crop-pest domain), and proceeded to examine the concept Plant and its relationships to other concepts on the basis of logical consistency. In NALT, the term plants is represented based on the relationships including broader term (BT), narrower term (NT), and related term (RT) as shown in Figure 31.

PAGE 47

34 Figure 3-1. Description of a term plants in NALT (“…” denotes additional terms not shown) The crop-pest ontology, howev er, has a wider variety of types of relationships between concepts. Superclass and Subclass terms, for instance, indicate a hierarchical structure of a particular concept. In addition, properties represent various relationships between concepts or between concepts and da ta values (such as strings and integers). Representation of a particular concept Plant is shown in Figure 2, using the semantic structure of the crop-pest ontology. There are several highly visible differences in the representa tions of the same concept Plant between NALT and the crop-pest ontology. First, formality of language in the ontology provides machine-readable information. But, informality of NALT results in a machine not being able to readily proce ss information within the thesuarus itself. Second, subclasses in the ontology should be logi cally defined using inherited “necessary and sufficient (if possible) conditions” from its superclass, as compared to the assertion of BT/NT relationships in NALT. In Figure 3-2, plant is a specifi c case of organism, meaning plant should be logically satisfied using its inherited conditions, such as the characteristics of organism as a livin g thing (replication, metabolism, etc.).

PAGE 48

35 a photosyn thetic organism that has cellulose cell walls, cannot move of its own accord, grows on the earth or in water, and usually has green leav es. Kingdom Plantae Figure 3-2. Description of a concept Plant from the crop-pest ontology, written in Web Ontology Language (OWL)44444FFFFF5: The concept Plant is defined as a class in the OWL. The class Plant is a subclass of a class Organism The class Plant is disjointed with classes Animal Fungus Protozoan and Bacterium The class Plant can have several properties, e.g. property is_host_of indicates that a plant is a host of something else. The domain of the property is_host_of is the 5 The precise description of the OWL language is shown at http://www.w3.org/TR/owl-ref/.

PAGE 49

36 class Plant Similarly, the range of the property is_host_of is a class Insect The properties is_host_of and is_pest_of are inverses of each other. In Figure 3-1, scions are defined as a NT of plant. However, a scion is not a plant but rather a part of a plant used for grafting pur poses. It is not logically consistent that a scion is a specific plant. Third, the semantic structure of related terms (RT) in NALT and the corresponding properties in the crop-pest ontology also differ. RT neither explains nor represents what kinds of relationships exist between term s. For example, plant has a RT of flora, but NALT does not explicitly e xplain the nature of the relationship itself, which could perhaps be “plant is an element of flora.” As shown in Figure 3-2, a property called is_element_of in the crop-pest ont ology has limitations of its values: domain, range can further reveal the logical consis tency of stored knowledge concepts. These limitations give the crop-pest ontology the a dditional ability of l ogic-based consistency validation. The Ability to Represent Complicated Concepts In trying to represent knowledge in the crop-pest domain, th esaurus or ontology developers would be confronted with a de scription difficulty. For example, a concept “large corn earworm larva on peanut leaf ” is a relatively complicated concept to represent. Here we explore the ability to re present complicated concepts in NALT and the crop-pest ontology through a process of crea ting the concept "large corn earworm larva on peanut leaf." To begin with, we state explicit meanings of the given concept: The size of corn earworm is large. The developmental stage of corn earworm is a larval stage. This corn earworm is located in a peanut leaf. Then, we examine how to describe these meanings in both NALT and the crop-pest ontology. In NALT, terms related to the croppest domain are selected from the given

PAGE 50

37 concept, such as corn earworm, larva, pea nut, and leaf. For example, peanut and larva would be terms as shown in Figure 3-3. Corn earworm Use: Helicoverpa zea Related Terms: Larva peanut leaf Figure 3-3. Three terms, corn earworm peanut and larva in the NALT thesaurus During the process of selection of terms fr om the given concepts, explicit meanings within the concept would be absent. For example, we cannot yet determine how big larva is or where the larva is located.

PAGE 51

38 Understanding the words in the example phras e is strongly related to syntactic and semantic understanding of them, which means identifying which parts are main words in a given phrase and which parts are its modifyi ng words. In this example, we found main words (i.e. head) in the given concept, "lar ge corn earworm larva on peanut leaf," as being "corn earworm." The head of ”corn earw orm” was therefore re presented as a class in the crop-pest ontology. The other words, cal led modifiers, modified the head of "corn earworm." These modifiers of the corn earworm were created as properties and their domain and range, as shown in Figure 3-4. Properties Domain Range Size of Insect concepts that indicate sizes such as large, medium, and small Development stage of Insect concepts that indicate developmental stages of insects such as egg, larva, pupa, and adult Locate in Insect Plant or Part of Plant Figure 3-4. Properties assigned a class corn earworm for describing the concept “large corn earworm larva on peanut leaf” Then, an individual of the class “corn earworm” called “corn earworm egg1” was created, with these properties filled out as in Figure 3-5. Finally, we formally generated the given concept "large corn earworm larv a on peanut leaf" usi ng the individual of a class “corn earworm” in the crop-pest ontol ogy without any loss of the explicit meanings. Individual(a:corn_earworm_egg1 type(a:corn_earworm_egg) value(a :is_pest_of_ a:peanut1) value(a: locate_In a:peanut1) value(a: has_size a:large1) value(a: has_developmental_stage a:egg1)) Individual(a:egg1 type(a:egg)) Individual(a:large1 type(a:large)) Individual(a:peanut1 type(a:peanut) value(a:is_host_of a:corn_earworm_egg1))

PAGE 52

39 Figure 3-5. The OWL abstract form for the individual of a class “corn earworm” to represent a complicated concept “large corn earworm larva on peanut leaf”555FF55FFF6 Reasoning Based on Representation Reasoning Facilities In the NALT thesaurus, the relationship between BT and NT could be a simple inference based on generalization and specification. In Figure 3-1, plants is treated as a specific organism with regard to BT. Algae and seaweeds is likewise treated as a specific plant based on NT, if NALT thesaurus can be converted into a formal language and BT/NT/RT is consistent.. In the crop-pes t ontology, however, the ability to process reasoning is well beyond generalizat ion and specification. A concept beetle in the croppest ontology inherits all properties and associations from its superclass insect, based on generalization/specification rules. Here, we explain an inference with the following example from the crop-pest ontology. We first assert three true propos itions in the croppest domain: Corn earworm is an insect that can damage peanut plants. An insect is an agent. Peanut pest is an agent that can damage peanut plants. Based on these three propositions, we can infer that corn earworm is also one of the peanut pests, using existential quantification. Following are the steps to perform the same inference in the crop-pest ontology. We define a class corn earworm asserting the first propos ition (See Figure 3-6, Part A). A class agent is a union of organisms such as insect, bacterium or fungus and nonorganisms such as virus or prion that cause s events (See Figure 3-6, Part B). Through the hierarchy of classes, we show corn earworm is an insect, invertebrate, animal, organism 6 “a” denotes a particular namespace, at http://www.owl-ontol ogies.com/unnamed.owl

PAGE 53

40 and agent, by a process of reasoning. A class peanut pest is defined as an intersection of an agent and its property damage_to_peanut (See Figure 3-6, Part C). Therefore, we deduce that a class corn earworm is a subclass of peanut pest automatically, as shown as Figure 3-6, Part D. A)Class(a:corn_earworm partial a:insect) ObjectProperty(a:damage_to_peanut domain(a:corn_earworm) range(a:peanut)) B) Class(a:agent complete unionOf(a:non_organism a:organism)) ObjectProperty(a:make_event_of domain(a:agent) range(a:event)) Class(a:animal partial a:organism) Class(a:invertebrate partial a:animal) Class(a:insect partial a:invertebrate) C) Class(a:peanut _pest complete intersectionOf(restriction( a:damage_to_peanut someValu esFrom(a:peanut)) a:agent)) D) Figure 3-6. The OWL abstract form for an infe rence “corn earworm is a peanut pest”. A, B, and C show how the ontology describes propositions in the domain. D is a diagram of the result of the inference. A circle indicates a class; a line indicates relationships between classes and subclasses; a dotted line denotes a

PAGE 54

41 property of two classes; and a red line i ndicates the peanut pest has a new corn earworm, as the result of the inference. In addition to the above examples, many othe r logical inferences can be made from entries in the crop-pest ontology. Searching Documents The NALT thesaurus brings more relevant documents into a search, providing an overall query expansion. A keyword from endusers first is matched into NALT to find the relevant terms such as those from Used for Once a certain relevant term is selected, it is added to the keyword search of document s in a publication. This query expansion can result in more relevant documents being re turned to end-users and leads to improved search ability, but the expansi on can also bring in more irre levant documents. In addition, browsing the controlled vocabulary in NA LT provides clues for finding relevant documents to users lacking an exact keyword. In the crop-pest ontology, the searching pr ocess is a reasoning process. Whenever end-users search information using keywords within the crop-pest ontology, the ontology in turn executes a reasoning process to find answers. It can produce better results because the ontology brings relevant information us ing knowledge not only asserted manually by experts (like NALT does) but also inferred au tomatically. For example, we detailed a search to find “peanut pest” concurrently in NALT and the crop-pest ontology. As Figure 3-7 explains, we found a preferre d term of “plant pest” in NA LT that is the closest known term to “peanut pest.” Once experts asserted peanut pests in NALT, the thesaurus then provides relevant terms for the search a nd brings further relevant information. We showed how the ontology deduces that co rn earworm is a peanut pest. Based on a similar method of deduction, we can get information on all peanut pests, using

PAGE 55

42 deductions for finding all peanut pests as well as any manual assertion such as “A is a peanut pest.” Figure 3-7. Screen shot of the pref erred term “plant pests” in NALT Automatic Validation of Logical Consistency The validation of NALT and the crop-pest ontology is one of the most important issues in providing better applications of searching information and cataloguing documents. If the thesaurus or the ontology is not es tablished to be logically correct, we can not expect a good result from any applic ation that includes the thesaurus or the ontology. In NALT, the validation of all terms and re lationships is executed by field experts' points of view. For example, throughout the de velopment of the first edition of the NALT thesaurus, Agricultural Research Service (ARS 2005) scientists and specialists in the field of agriculture manually conducted its validation.

PAGE 56

43 In the crop-pest ontology, all classes, pr operties, domains, ranges and individuals were authenticated by experts. In addition, the deductive ab ility within the crop-pest ontology can itself assist in the validation process. This deduc tion is a product of inference executed by a machine. So, once th e crop-pest ontology is developed, we can automatically validate the ontology by checki ng logically correct de ductions according to components of the ontology such as classe s and properties. Automatic validation of classes and properties was execu ted by Pellet, which is a reas oner built in Java that was designed specifically for OWL reasoning (Pel let 2005). It checked the crop-pest ontology consistency and reported any unsatisfied classe s within it, as demons trated in Figure 3-8. A. B. Figure 3-8. Screen shots of OW L consistency checker and the results in Pellet. A shows the front page of the OWL consistency checker. B shows consistency of the crop-pest ontology.

PAGE 57

44 Conclusion The difference between a thesaurus a nd an ontology were explored, through a practical comparison between NALT and th e crop-pest ontology. The fundamental differences of representing domain knowledge between them were the formality of language in the crop-pest ontol ogy; the logical consistency of concepts and relationships in the crop-pest ontology; and the explicit de scription of in NALT. Formality in the ontology allowed a machine to more readil y process information within the ontology itself. Some previous research of converting th esauri into formal languages such as Resource Description Framework (RDF) have been studied, though they reported several conversion problems based on relations in the thesauri (Matthews 2002). Logical consistency of concepts and properties br ought with it higher faculties of automated reasoning. Ambiguity of relations in the thes aurus was analyzed, in particular those relationships such as broader term (BT) and narrower term (NT), which can inadvertently be used in an ambiguous fashion (e.g. a partic ular concept is a speci al case of another concept, or that a particular concep t is part of yet another concept). The differences between the representations of data in both technologies could bring about a different level of power of reasoning within th eir applications. In the NALT thesaurus, the relationships such as BT or NT become a simple inference based on generalization and specification, assuming the underlying relationships is valid. In the crop-pest ontology, however, the ability to process reasoning is well beyond generalization and specification. It supports th e deduction of a conclusion based on true propositions, the search of information as a result of inference, and the automatic validation of logical consistency in the ont ology. We conclude that of the two studied

PAGE 58

45 systems, an ontology provides the better representation of domain knowledge and a greater power of reasoning based on the unde rlying representation, which could improve searching documents in ag ricultural publications.

PAGE 59

46 CHAPTER 4 BUILDING A DATABASE AND GRAP HICAL USER INTERFACE FOR BROWSING IMAGES BASED ON THE CROP-PEST ONTOLOGY Introduction The most common interface for image retrieval is one that allows users to type keywords and see search results in a table or dered by relevance. This type of interface, such as Google Image Search (Google Image Search, 2005), performs well with a large pool of Web images, but it does not allow user s to browse images (Yee, 2003). Markkula reported that professionals in artistic fields such as jour nalism, design, and art direction use browsing as a basic strate gy in searching for images. The first reason is that browsing aids in the development of illustration id eas. The second reason is that some words describing selected images may be difficult to express freely as sear ch keywords but are easily applied when the images are seen. The third reason is that image selection depends on a particular work situation, which is di fficult to anticipate in indexing. The fourth reason is that artistic professionals feel comfortable with browsing. Similarly, growers and other professionals in the ag ricultural field would want to be able to browse images as well as search for them with keywords. Indexing images is vitall y important to image browsing. One commonly used approach is using a thesaurus to inde x images (Hyvonen, 2004). An image can be categorized by a thesaurus that classifies diffe rent aspects of images into hierarchical categories. However, a thesaurus turns out to provide only part of the knowledge needed,

PAGE 60

47 when a knowledge-rich descrip tion of images is required for indexing them. Wielinga stressed that the structured knowledge-based de scription of images is much richer than the traditional “set of terms” like a thesau rus (Wielinga, 2001). As shown in the chapter 3, an ontology could improve the deficiencies of a thesaurus in describing domain knowledge by using a formal representation of concepts. Hyvonen implemented an ontology-based image retrieval system for a photo exhibition using the promotion image databa se of the Helsinki University Museum (Hyvonen, 2003). In the system, images were annotated according to a promotion ontology. The ontology helped user s formulate their queries. Schreiber reported a system of ontology-based photo indexi ng and searching in an im age collection about apes, including chimpanzees, gorillas, and ora ngutans (Schreiber, 2001). He developed a domain ontology for annot ating ape photographs. As shown in Chapter 2, the crop-pest ontology has been developed for browsing 291 images in a collection. The ontology can be used as a tool for cataloging those images. This chapter describes how images were indexed using th e crop-pest ontology. The implement of a graphical interface for browsing the images will be introduced. A usability study will also be presented Ontology-Based Image Indexing Several studies have been done on us ing ontologies as tools for indexing information. Desmontils explored indexing Web pages using a terminology-oriented ontology. He presented a semi-automatic process for indexing a Web site associated with the ontology (Desmontils, 2002). He insisted that an ontology-based indexing approach could provide more precise retrieval within a given Web sites. Tsinaraki indexed audiovisual information such as images and videos associated with an ontology.

PAGE 61

48 Tsinaraki suggested that indexing multimedia information using the same ontologies across different multimedia standards had the advantage of interoperability across applications (Tsinaraki, 2005). In addition, Tsinaraki pointed out the most interesting aspect of ontology-based indexing, which is th at the approach provi des not only simple retrieval of audiovisual conten t with a simple keyword query but also enhanced contentbased retrieval with semantic queries such as “give me video clips where the players Ronald or Beckham appear” for audiovisual content through the use of domain-specific ontologies for the knowledge domain. Since user requests within the agricultural fields involve complex concepts as well as simple concepts based on domain knowledge, this ontology-based image indexing is a promis ing approach to fulfill user needs. Creating Concepts for Indexing Images in the Crop-Pest Ontology New classes, properties and instances were added into the cr op-pest ontology to index 291 images associated with the ontology. First, the class “dig ital photograph” was created in the crop-pest ontology because each real image became an instance of the class “digital photograph.” The class get common properties from upper classes: “thing,” “physical,” “object,” “self-connected object ,” “substance,” and “photograph: Figure 4-1 shows the hierarchy from the root class “t hing” down to “digital photograph.” The class “digital photograph” has seve ral subclasses, such as “pes t photograph,” shown in Figure 4-1. Second, several properties of this “photogr aph” class were assigned. After manual analysis of 291 image captions, four comm on relationships were detected. These relationships were assigned into four pr operties of the “digita l photograph” class: damage: any process caused by an agent agent (pest): insects that cause damage

PAGE 62

49 host: plants that are attacked by insects location: the place where plants or insects are located Third, each image was defined as an individual of the cla ss “digital photograph.” Based on the content of each image, values of each property for each individual were specified during the process of indexing images, as explained in the next section. The Indexing Process This process builds a structured index of images according to the crop-pest ontology. The indexing process ca n be divided into five st eps, shown in Figure 4-2. Step 1: Syntactic and Semantic Analysis of Each Image Caption All 291 images have image captions that describe the content of images according to domain experts. The image captions, shown in A, were analyzed both syntactically and semantically. Syntactic analysis is based on the grammatical structure of the captions, and semantic analysis is based on the meaning of each word appearing in a caption and the relationships among word meaning establishe d by the syntax. These analyses extract domain knowledge implied by words or phrases in the image captions as well as the content of the images. For example, the image caption shown in Figure 4-2, “damage caused by three cornered alfalfa hopper on s oybean,” was analyzed syntactically and semantically. As shown in Figure 4-3, the word “damage” is a head, or the main word in the phrase. Two modifiers that qualify the meaning of the head follow it: One is the past-participial phrase “caused by three cornered alfalfa hopper,” used like an adjective phrase, and the other is the prepositional phrase “on soybea n.” The semantic analysis was based on the syntactic analysis. The first m odifier indicates an agent that causes damage, the specific pest “three cornered alfalfa hopper.” The sec ond modifier describes not only the location

PAGE 63

50 of the damage, which is “soybean,” but also im plies the host of the pe st “three cornered alfalfa hopper” because this content of image is restricted to the crop-pest domain. These syntactic and semantic analyses of th e given image caption provided information necessary to index the image into the crop-pest ontology. Step 2: Creating an Individual of the Image in the Crop-Pest Ontology Each image becomes an individual of cl ass “digital photograph ” in the crop-pest ontology because each image was a real and specific example of the class “digital photograph.” For example, the image with the caption “damage caused by three cornered alfalfa hopper on soybean” was created as an in dividual of the class “digital photograph.” The individual called “Image – SOY006” repr esents the image with the caption “damage caused by three cornered alfalfa hopper on soybean.” Step 3: Filling in the Values of Properties for Each Individual Each individual created in Step 2 has four properties that must satisfy the properties of the class “digital photograph.” The values of those four propertie s were filled out with information provided by the syntactic and seman tic analyses. For example, the individual in Step 2, “Image – SOY006,” had the following four properties, to which values were assigned in the following way: damage: damage agent (pest): three cornered alfalfa hopper host: soybean location: soybean Step 4: Connecting Assigned Values into Classes/Individuals in the crop-pest ontology Values assigned to the four properties for each individual were connected with classes and individuals in th e crop-pest ontology. This ma pping of property values onto the crop-pest ontology provides access to domain knowledge while browsing through

PAGE 64

51 classes/individuals within the ontology. The individual “Image – SOY006,” for example, has “three cornered alfalfa hopper” as the va lue of the property “agent.” The value “three cornered alfalfa hopper” was ma pped into the class “three co rnered alfalfa hopper” in the crop-pest ontology. The class then showed 1) what it is and 2) what kinds of relationships there are among other classes or individuals exposing the domain knowledge. Also the soybean value was mapped to the soybean class. Step 5: Saving the Individual into the Crop-Pest Ontology The index of each image from Step 1 through Step 4 was saved into part of the crop-pest ontology as an individual. An Interface for Browsing Images Goals The first goal was to provide an interface to growers, county agents, and other agricultural users that would im prove their ability to locate images compared to keyword based search interface. The users do not necessa rily know what keyword to type to find images. This keyword-based approach is usef ul, when the users are familiar with what kinds of images are in a co llection and/or when the user know what kind concepts are used to retrieve relevant images. A facility to browse can help users who are not familiar with a certain image collecti on to retrieve images easil y and effectively, even though users must carefully guide themselves usi ng textual or graphical indications of the content reachable via a link (Olston, 2003). The second goal was to provide the user s with the ability to acquire domain knowledge described by the crop-pest ontolog y while browsing concepts or images. A facility to show the concepts relationships fo r the domain of an image collection to users

PAGE 65

52 can address the limitations in keyword-base d interfaces that do not provide such domain knowledge Features to be Supported The graphical user interface has several feat ures and requirements. First, a facility for browsing images is required in this interface for users who are not familiar with the image collection or who have lack of do main knowledge. Second, a facility to support visualization of a hierarchical structure of classes in the crop-pest ontology is required. Classes in the crop-pest ontology are arranged in a hierarchical structure. Therefore, the interface should support the visualization of hierarchy of classes in the crop-pest ontology. Third, a facility to show propert ies of the crop-pest ontology is required. Properties that show relations hips between individuals or between individuals and data values should be shown to users. Fourth, a f acility to show relationships of each image with concepts in the crop-pes t ontology is required. Each im age has relationships to other concepts in the crop-pest ontology. Therefor e, the interface should provide concepts related to each image. Fifth, a facility to find all images related to a particular class in the crop-pest ontology is required. The Graphical Interface All features to be required to achieve these goals were implemented in the interface. First, the facility to browse images was implemented by modifying TouchGraph, an open source program for da ta visualization (T ouchGraph, 2005). The interface is shown in the figure 4-4. The facility to browse images is shown in the center of the interface. The display contains a gra ph with nodes including a rectangular, an oval, or a thumbnail of an image; a rectangular node means a class in the crop-pest ontology and an oval node indicates an individual in the ontology. Properties are displayed as

PAGE 66

53 edges shown in the figure 4-4 which are gray lines connecting nodes. When a user click a node, the neighboring nodes are ex panded or hidden, enabling users to control the range of nodes to be seen. Users can browse imag es following to the expansion of nodes shown in figure 4-5. The facility to support visualiza tion of a hierarchical st ructure of classes in the crop-pest ontology was implemented using co lors and shapes of nodes. Classes in the crop-pest ontology were repr esented by rectangular nodes. Each rectangular node can have three colors. A rectangular in orange repr esents the selected class. A rectangular in blue is a superclass. A recta ngular in green is a subclass. Since the interface is for growers and other professionals who might not be familiar with components of the croppest ontology, the interface a voids using technical termi nology of the ontology such as class, subclass, and superclass. The meaning of colors (“current se lected: orange, more general: blue more specific: green”) is defined in the interface in Figure 4-5. The facility to show properties of th e crop-pest ontology was implemented using edges. There are two edges: One is an edge without any label and the other is an edge with a label. An edge with a label represen ts a property in the crop-pest ontology shown in the figure 4-6. The thick end of each e dge with a label indicates the domain of a property and the thin end of it shows the range of the property. For example, in the figure 4-6, a property called “is pest of ” is shown the edge with the label “is pest of” in black. The domain of the property is a class “insect” and the range of th e property is a class “plant”. The thick end of the edge in black indicates the domain, th e class “insect”, and the thin end of the edge indi cates the range represents the range of the property. An example, “is pest of” is highlighted in black.

PAGE 67

54 A facility to show relationships of each image with concepts in the crop-pest ontology was implemented. When an image is selected, all related concepts with the image are shown. The facility provides bac kground knowledge of the image as well as content. In addition, users can expand their retrieval experience in the image collection by following related concepts. For example, rela ted concepts includi ng bean leaf beetle, damage, soybean and related properties are shown, when an image pointed by the black arrow in the figure 4-7. A facility to find all images related to a particular class in the crop-pest ontology was implemented. In this image collection, user s might want to see all images related a particular class without brow sing nodes. This facility shows all images related a particular class, given that the user enters the term for that class. For example, in the figure 4-8, a term from the input box located in the left most top of the screenshot “stink bug” was matched into a class “stink bug” in the crop-pest ontology by comparing the given term with names of classes. When user s click a button “show all images about stink bug” located in the right most top of th e interface, a pop-up window shows all images that were manually assigned into individuals of class “stink bug” and its subclasses. Usability Study The graphical interface was designed for growers, county agents, and other agricultural users to improve their ability to locate images compared to keyword based search interface. In addition, this interface was developed in order to provide the users with the ability to acquire domain knowledge by browsing the crop-pest ontology while searching for images. The usability study was planned to evaluate this interface based on these two objectives.

PAGE 68

55 The Keyword-Based Search Interface To compare this graphical interface with the conventional keyword-based search interface, a keyword-based search interface ca lled “baseline interface” was built using the Egothor (Egothor, 2005), an open source full-featur ed text search engi ne written entirely in Java. 291 web pages were generated fo r indexing those images. Each Web page contains an image file (.jpg) and its im age captions as text. Figure 4-9 shows the keyword-based interface and a web page c ontaining an image and its caption. The main page for the baseline interface provides an entry input box for typing in search keyword(s) and one paragra ph about the description of the domain. After users enter keyword(s), a linked list of We b pages of search results is shown. When users click a Web page in the list, the Web page shows an image and its caption. Hypotheses to Be Tested Based on the two goals of this interface, two hypotheses were tested. Hypothesis 1 : The graphical interface will produce higher precision and recall than the baseline interface. Hypothesis 2 : The graphical interface will help users to learn more about the crops, pests, and relationships between them Design and Procedure The hypothesis 1 was tested by evaluating precision and recall of the graphical interface and the keyword-base d image interface. The proc ess of this evaluation was carried out in three stages. In the first stage, search terms used to determine precision and recall were drawn. 134 terms were collected fr om terms that users typed in to search documents in an agricultural information system called Electronic Data Information Source (EDIS). Based on the range of the image database covered by both interfaces, seven search terms were drawn out of the 134 search terms. The selected terms were

PAGE 69

56 shown in the table 4-1. In the second stage, bo th interfaces were accessed for the selected search terms. Finally, the da ta was analyzed for results. The hypothesis 2 was tested by evaluating user satisfaction implemented using a between-subjects design. Data from 20 pa rticipants was used in the analysis. 19 participants are in the Department of Ag ricultural and Biolog ical Engineering in University of Florida and one participant is in the Department of Horticulture in University of Florida. The preliminary study was done with agricultural researchers including 3 faculties, 1 staff, and 16 graduate students. The participan ts were all users of the Internet, searching for information eith er everyday or a few times per week. They searched for images online either every day or a few times per week. Each participant used either the baseline interface or the gra phical interface. Since th ere is no explanation before using a interface in practice, participants were not introduced to the features of the interface. But, the fact that they accessed the same imag e collection about crops and related pests was introduced before starting th e evaluation. Either the baseline interface or the graphical interface was randomly assigned. Throughout th e study, subjective ratings were reported on a 4point scale (Strongly agree, agree, disagree, strongly disagree). A sample question is as following: Did this interface help you learn more about crops, pests, and relationships between them? After finishing the retrieval of images in each interface, participants completed an evaluation. All questions to be as ked are shown in the Appendix F. Results The precision and recall of seven search te rms in both interfaces were shown in the table 4-1. The statistical analysis for comp aring two precision between the graphical

PAGE 70

57 interface and the baseline interf ace was shown that precision of the graphical interface is higher than that of the baseline interface( t = -2.11, p = .02). In addition, recall of the graphical interface is higher than that of the baseline interface ( t = -2.10, p = .02). Therefore hypothesis 1 was accepted. The test of hypothesis 2 was done by the anal ysis of response on th e question “Did this interface help you learn more about crops, pest s, and relationships between them?”. The table 4-2 shows individual response on the que stion. The statistical analysis shows there is no significant betw een two interfaces ( t = 2.9 p = .46) and the hypothesis 2 was rejected. However, the advanced evaluati on would be needed, since responses from participants indicate some di fference. The design of this evaluation is based on betweensubjects design. This design treats the di fference of subjects (i.e. difference of participants) as an error. However, the with in-subjects design can d eal with the difference of subjects as a variable. In addition, the size of pa rticipants is five, which is small. Moreover, the subject ratings ba sed on 4 –point Likert scale mi ght be not sensitive to get the precise response from participants. Ther efore, the advanced evaluation would be considered using 1) the with in-subjects design, 2) bigger si ze of participants and 3) a wide range of subject ratings. Conclusion Based on the crop-pest ontology, the ontol ogy-based image indexing was done. The new class “digital photograph” was created in the crop-pest ontology to index 291 images. Several properties related with “dig ital photograph” were assigned. The process of indexing images is as following: Syntactic and semantic analysis of each image caption

PAGE 71

58 Creating an individual of the image in the crop-pest ontology Filling in the values of properties for each individual Connecting assigned values into indi viduals in the cr op-pest ontology Saving the individuals into the crop-pest ontology After indexing 291 images, a graphical interface for browsing those images was designed to provide a tool to growers, county agents, and other agricultural users that would improve their ability to located im ages comparing to keyword-based search interface. In addition, this in terface was aimed to provide th e users with the ability to acquire domain knowledge described by the cr op-pest ontology. The graphical interface enables users: To browse images To support visualization of a hierarchical stru cture of classes in the crop-pest ontology To show properties of the crop-pest ontology To show relationships of each image with concepts in the crop-pest ontology To find all images related to a particular class The preliminary usability study was done for this graphical interface. Data from 10 participants was used in th is analysis. To compare conve ntional keyword-based search interface, a keyword-based sear ch interface was built. Hypotheses to be tested are as following, based on the goals of this graphical interface: Hypothesis 1 : The graphical interface will produce higher precision and recall than the baseline interface.. Hypothesis 2 : The graphical interface will help users to learn more about the crops, pests, and relationships between them To test hypothesis 1, seven search terms we re drawn from the user inputs from the EDIS and the precision and recall of the graphi cal interface were higher than those of the baseline interface. Because that precision and recall represent the efficiency of searching

PAGE 72

59 information, the higher precision and recall in dicated that growers, county agents, and other agricultural users improve their abi lity to locate images using the graphical interface, compared to keywor d based search interface. To test hypothesis 2, usability study wa s implemented using between-subjects design. Each participant used either the keyw ord-based search inte rface or the graphical interface using the crop-pest ontology. Thr oughout the study, subjective ratings were reported on a 4-point Likert scale. The quest ion “Did this interface help you learn more about crops, pests, and relati onships between them?” was asked to participants. Eight participants out of ten were that the gra phical interface helped them learn more about domain, while five participants out of te n using the keyword-based search interface agreed with that and two part icipants strongly disagreed. The statistical analysis showed the hypothesis 2 was rejected. But the respons es from the participants indicated an advanced study of the evaluation of the grap hical interface. The advanced study would be designed using 1) the within-subjects design, 2) bigger size of participants, 3) a wide range of subject ratings, and 4) th e retrieval of images by a task.

PAGE 73

60 Table 4-1. The result of evaluating pr ecision and recall in both interfaces Graphical interface Baseline interface Search terms Precision (%) Recall (%) Precision (%) Recall (%) Aphid 0 0 100 50 Bug 0 0 100 58 Green stink bug 0 0 0 100 Laying egg 0 0 0 0 Plant 50 3 100 11.6 Stink bug 100 100 100 100 White fly 0 0 100 100 Table 4-2. Participants responses about the question Did this in terface help you learn more about crops, pests, and relationships between them? Interface using keyword Interface using ontology strongly agree 0 0 Agree 5 8 Disagree 3 1 strongly disagree 2 0 N/A 0 1 Total 10 10 Figure 4-1. Hierarchy from the root class thing to digital photograph.

PAGE 74

61 Figure 4-2. Overview of indexing images associated with the crop-pest ontology. Figure 4-3. Syntactical and semantic analys is of the image caption “damage caused by three cornered alfalfa hopper on soybea n.” A shows the result of syntactic analysis. B illustrates the resu lt of semantic analysis.

PAGE 75

62 Figure 4-4. The interface that provide a faci lity to browse 291 images. The browsing facility is located in th e center of the interface. Figure 4-5. Expanding a node (ora nge rectangle) shows an im age. The black arrow (not part of the interface) points at the definition of node colors.

PAGE 76

63 Figure 4-6. A facility for showing properties. A property is shown by an edge with a label. The thick end of each edge repres ents the domain. The thin end of each edge

PAGE 77

64 Figure 4-7. The facility to show the selected images with related concepts in the crop-pest ontology.

PAGE 78

65 Figure 4-8. The facility to show all images related with a particular class in the crop-pest ontology

PAGE 79

66 Figure 4-9. The screenshot of keyword-based search interface impl emented with Egothor. A shows the main page of searching images B shows the Web page of an image and its caption “stink bug nymphs on soybean pod”. This web page was used for indexing the image

PAGE 80

67 CHAPTER 5 ONTOLOGY-ASSISTED INFO RMATION EXTRACTION Introduction The indexing of image captions associated with the crop-pest ontology is a process in which important information is extracte d from the image captions based on syntactic and semantic analysis and the in formation then saved into a st ructural form that enables a reasoner to reach it and apply reasoning pro cesses to it. Chapter 4 described the manual indexing of image captions. This chapter intr oduces a semi-automatic approach to the creation of indexes based on s yntactic and semantic analys is of natural language. The creation of an index can be redefined as th e process of extracting relevant information from text and building a formal semantic structure for it. The process is called information extraction Information extraction is a pr ocess that identifies useful information from natural language text from a particular domain and converts that information to a structured form which can be saved into a database (Cardie, 1997). In ontology-assisted image retrieval, the structur ed form that results from the information extraction associated with the ontology is created as an indi vidual of class “image” inside the crop-pest ontology. SMES is one informa tion extraction system that uses domainspecific knowledge from an ontology (Maed che, 2002). SMES maps free text into a domain-specific ontology containing targ et knowledge structures about crucial information for answering questions such as who, what, whom, when, where or why. The target knowledge structures ar e predefined by a given ontology.

PAGE 81

68 Manual information extraction incurs a high time and labor cost for syntactic and semantic analysis because domain experts analyze the syntactic and semantic structure of natural language text based on their domain knowledge to create the structure. The semiautomatic approach suggested here is based on a natural language pa rser, phrase patterns, and the crop-pest ontology. Parsing is define d as the process of analyzing a continuous stream of input such as text in order to dete rmine its grammatical stru cture with respect to a given set of grammar rules. A parser is a computer program that carries out this task. Phrase patterns provide syntac tic and semantic information to the parser. The crop-pest ontology is an ontology that de scribes crops and related pe sts in a formal way, as discussed in the chapter 2. The three compone nts function together to analyze natural language text and create the semantic stru cture automatically. This approach is considered to be semi-automatic rather th an automatic since the phrase patterns were created by humans and some manual validation of the automatically created structures is required. The organization of this chapter is as follows. The components of the information extraction system will be introduced: 1) ontol ogy, 2) phrase patterns and 3) island chart parser. Semantic structures as the outputs of the system will be described as well. The results from the information extraction system are presented and discussed. Components in Information Extraction System The information extraction system consists of 1) the crop-pest ontology as a source of domain specific knowledge, 2) an island ch art parser which parses given input strings and converts them into semantic structures, and 3) phrase patterns which provide specific grammar for the island chart parser (Figur e 5-1). Inputs of the system are untagged natural language text from image captions in the 291 images. Outputs of the system are

PAGE 82

69 semantic structures that contain information about the input text associated with the croppest ontology. Figure 5-1. The information extraction system in the crop-pest domain. The semantic structure can be classified as an indi vidual of a particular class inside a domain-specific ontology, mapped from th e natural language text in a given caption Ontology Knowledge of the crop-pest domain is a vital component for building an information extraction system. Domain expert s who have extensive experience can build this knowledge. Since every domain expert has a different way of presenting their knowledge, a formalized form is necessary. One such form is an ontology, as was introduced in Chapter 2. Phrase Patterns Phrase patterns are domain-specific pattern s that represent meaning of phrases (or a word) as well as their syntactic structure. Phrase patterns contain relationships among phrases that are specified as ATTRIBU TE and HEAD. The HEAD is the main concept appearing as a single word in the phrase being analyzed, and ATTRIBUTE is a modifier of the HEAD. For example, to repr esent cotton leaf, we can create a phrase pattern such as < np plant ATTRIBUTE: plant, np plant part HEAD>. In the phrase

PAGE 83

70 pattern, npp is a category of noun phrase that can be resolved using other patterns for noun phrases. In the case of “cotton leaf”, each np in the pattern resolves to a single noun. The phrase pattern also describes th e semantic structure of of th e phrase and the role of each word in that structure as shown by the labels plant and p lant part The phrase pattern explains a relationship between “cotton” a nd “leaf” with ATTRIBUTE and HEAD, since a leaf is part of cotton. This phrase pattern can be applied fo r other phrases as well, such as “cotton stem,” “cotton root” and “peanut leaf.” As shown in Figure 5-2, phrase patterns are organized with a hi erarchical structure. and become the most specific phrase patterns in the hierarc hy. The phrase pattern in the above example is a child of the phrase pattern of , which is a child of . These phrase patterns can be used a gramma r. A grammar is a formal description of the structures acceptable in a language (Allen, 1995). One type of grammar is contextfree grammar. Context-free grammar is roughly defined as a grammar in which the left side of the rule has a single symbol. It is effective enough to express most of the structures in natural lang uage (Allen, 1995). However, this grammar is highly ambiguous. Phrase patterns can lessen such am biguity by using words in context specific to a certain domain. Figure 5-2. The hierarchical st ructure of phrase pa tterns for a concept “plant part”. A phrase pattern < n cotton ATTRIBUTE:co tton, n leaf HEAD> is more specific pattern than , because a cotton is a specific plant and a leaf is a specific plant part.

PAGE 84

71 Island Chart Parser A chart parser is a parser that utilizes of a set of grammatical rules, and dictionary with each of the possible grammatical senses of each word indicated, and a data structure called a "chart". A chart is a linear list of nodes and retains all edges. The chart parser reads a phrase from the starti ng point -usually starting from the beginning of a sentenceextending parsing usually in a rightward. Howe ver, the chart parser can not recognize fragments of words such as island in a sent ence, because it works from the starting point and parse through one way. Therefore, the isla nd chart parser was introduced for parsing bidirectional to recognize fragments of words: the parser can parse in both left and right directions. The island chart parser parses the syntactic and semantic structure of an input string with domain-specific rules. The phrase patt erns are used as domain-specific rules (grammar) for the parser. The parser consists of nodes edges and a chart A node is a point between two words in the input string that is being parsed. One node has information about all the edges that are coming in or going out from it. An edge is a data structure that represents a comp lete or partial parsing of the input. It applies a particular rule and is labeled by the rule. A chart is a linear list of nodes and retains all edges. Figure 5-3 shows an edge that has a rule and four nodes in a chart. The parsing procedure of the is land chart parser has three steps. The first step is the initialization of a chart with edges. The ini tialization is done by looki ng up each term in a given input and retrieving all patterns asso ciated with each term The second step is cycling on the edges that are generated from th e first step. During this step, each edge is checked that it can be expanded by satisfying a rule associated with the edge. If so, the edge becomes a complete edge: the rule associated with the edge is completely satisfied.

PAGE 85

72 Otherwise, it becomes an incomplete edge: the rule associated with the edge is not completely satisfied. The complete edge also can be used for extending other incomplete edges inside a pending edge list. The thir d step is building parse trees, which are generated from edges parsed completely on th e chart. The parser produces one or more parse trees as an output of parsing, since one phrase can be interpre ted in several ways due to the ambiguity of natural language (S eiffert, 1987). For each parse tree, a semantic structure is created for representing the meaning of the phrase Semantic Structure Semantic interpretation is a process that translates parse trees from outputs of parsing into structured forms. The structured forms are called semantic structures The structure is an object representing a concept that can be automatically classified and added to the ontology. Each semantic structur e 1) contains concepts that are associated with the crop-pest ontology, 2) is saved as the instance of the HEAD concept of an input string in the crop-pest on tology, and 3) can be used for retrieving images. Indexing of images is a process in which important information is extracted from the image captions based on syntactic and seman tic analysis. Semantic structures of each image caption, outputs of the information extrac tion system consisting of phrase patterns, the crop-pest ontology, and the is land chart parser, were created based on syntactic and semantic analysis. They provide a rich desc ription of each image. In addition, they are indirectly related to concepts associated with each caption. The semantic structure could be used as indexes of images containi ng rich information to retrieve images.

PAGE 86

73 Figure 5-3. Components of the island chart parser with a simple input string. This diagram shows only one edge, but there are many other edges for the phrase being analyzed. An edge has two dots (left and right), which show how many symbols in the pattern are applied: th e left dot shows how many symbols have been applied so far in the left dir ection and the right dot shows how many symbols have been applied so far the right direction Results of the Information Extraction System The information extraction system in the crop-pest domain was tested by 150 phrases from image captions. The island chart parser traversed input strings with the crop-pest ontology and phrase patterns. One of parse trees is shown in the left side of Figure 5-4. The parse tree was c onverted to a semantic struct ure, as shown in the right side of Figure 5-4. The semantic structure c ontains 1) words which are associated with the crop-pest ontology (bold), 2) semantic relationships among them (italic), and 3) numbers as an identifier for the synset that represents an appropriate meaning of each word. The HEAD (underlined) of the phrase “cotton stainer adult in a white cotton bloom” is the concept “adult.” Therefore, the se mantic structure was saved as an instance the concept “Image” in the crop-pest. The inst ance of “adult” can be used for searching “adult” directly or related words indirectly. The result of extracting information was that 130 of 150 phrases were parsed and converted to semantic structures. The island chart parser could not parse the remaining 20 phrases because of the existence of words that were not associated with phrase patterns.

PAGE 87

74 Those words could be categorized into thr ee groups: 1) 13 phrases containing plural nouns (even though the system includes phrase patterns which can determine the singular form), 2) two phrases with conjunctions such as “and,” and 3) 5 phrases that included added parenthetical explanations, such as “w ireworm larva (click be etle grubs) on peanut photograph.” The first two categories could be fixed by adding more patterns for detecting conjunctions and plural nouns. Items in the third category could be successfully parsed by modifying the parenthetical phras es using synonyms. Usually the information inside parentheses in the croppest domain represents othe r common names of insect or scientific names. These names are synonyms of the original names (i.e. wireworm larva = beetle grub). The crop-pest ontology can present synonym s. These changes, would improve the information extraction syst em to parse given all 150 phrases. Figure 5-4. A parse tree (left) a nd the semantic structure (righ t) based on the parse tree for the caption “cotton stainer adult in a white cotton bloom.”. The numbers appearing in the right pane l indicates that each term was mapped into classes in the crop-pest ontology adult 46 location : in 296 plant part : bloom 82 plant : cotton 74, color : white 916, determiner : a 731 organism : stainer 615 host : cotton 74

PAGE 88

75 Conclusion An information extraction system was de veloped with the crop-pest ontology and phrase patterns as domain-specific knowledge sources. This system provides a semiautomatic approach to the crea tion of indexes based on syntactic and semantic analysis of natural language. The system was then te sted with 150 image captions. One hundred and thirty phrases (86.67%) were parsed and conve rted to semantic structures successfully. Phrase patterns were constructed manually for the information extraction system. This manual building of phrase patterns is very tedi ous. In future work, automatic building of phrase pattern from existing phr ase patterns will be explored in the information extraction system.

PAGE 89

76 CHAPTER 6 CONCLUSION AND FUTURE DIRECTIONS The new approach of retrieving images asso ciated with an ontology is present here. 291 images describing crops and related pests (called “crop-pest domain”) were used to develop the new approach. An ontology cal led “crop-pest ontology” was built for retrieving the 291 images, coveri ng concepts in the crop-pest domain. It provides formal description of concepts in the crop-pest domain and suppor ts reasoning processes based on the formal structure such as image search. A practical methodology for developing the crop-pest ontology was suggested according to the principles. Each step of the development was explained with specific ex amples. The crop-pest ontology contains 286 classes, 81 object properties, 36 datatype pr operties, and 305 individuals. The top-level of the ontology was imported from the Suggest ed Upper Merged Ontology (SUMO), which shows a good example of the reusability of existing ontology. The consistency of the crop-pest ontology was checked using the OWL reasoner (open-source Java based OWL DL reasoner) as validation of the ontology. The result of the valid ation showed that classes, properties and indi viduals in the crop-pest ontol ogy are logically consistent. Complete ontology not only can support th eir intended applications and function properly, but also can be re-use d for the development of other ontologies. Therefore, the evaluation of the crop-pest ontology is e ssential process. The crop-pest ontology evaluation was done using the qua ntitative approach, testing the coverage of the ontology with a domain-specific corpus. 138 terms from a domain-specific corpus were tested to check the coverage of th e crop-pest ontology, compari ng the well-know agricultural

PAGE 90

77 ontology, AGROVOC. The crop-pest ontology covered 44.93% of tested terms, while AGROVOC covers 30.43% of them. It indi cated the crop-pest ontology coverage is better than AGROVOC in the domain-specific corpus. Therefore, the crop-pest ontology can support the ontology-assisted image retrieval in the crop-pest domain. Jacob’s claim about “a controlled vocabular y is itself an ontology” brought a new research of a practical comparison between the National Agricultural Library Thesaurus (NALT) and the crop-pest ontology. In this research, two categorie s both NALT and the crop-pest ontology were considered to compar e. One is representation ability of domain knowledge and the other is reasoning ab ility based on the representation. NALT represents the domain knowledge based on simple relations such as BT and RT, it occurs ambiguity of relations. In addition, the descri ption of NALT was not written in a formal language, which means that a reasoner can not reach each component of NALT and not do any reasoning process. However, the r easoning ability can give the power of deduction of a new conclusion based on the true statement, since the crop-pest ontology is written in the formal language. In additi on, the reasoning ability s upports the search of information with high precision and recall. Furthermore, it provides the automatic validation of logical consistency. Therefore, the practical comparison offers the following conclusion; the crop-pest ontology provide s the better representation of domain knowledge and a grater power of reasoni ng based on the underlying representation, which could improve searching technique in the agricultural information system. Indexing images is a vital importance to support to browse images. Current technique that indexes images by thesaurus can overlook some information inside each image that might be a essential to index it. The crop-pest ontology could improve the

PAGE 91

78 deficiency of describing domain knowledge comparing with thesaurus. So, indexing images associated with the crop-pest ontology was explored in this research. This indexing process is as followings: Manual syntactic and semantic analysis of image caption of each image Creating an individual of the image in the crop-pest ontology Filling the values of properties on the individual Concerning assigned values into classes /individuals in the crop-pest ontology Saving the individual into the crop-pest ontology Index of each image from step1 through 5 was saved into part of the crop-pest ontology as an individual. The index can be retrieved for browsing images. Demand of a new interface to browse images a ssociated with the crop-pest ontology brought the work to create the new interface. The goal of this interface is to support the browsing images, avoiding negative consequences like empty re sult sets or feeling of being lost. In addition, it provides users to acquire domain knowledge described by the crop-pest ontology, during browsing concepts or images. The new interface has several features to support as following: To browse images To support visualization pf a hierarchical stru cture of classes in the crop-pest ontology To show properties of the crop-pest ontology To show relationships of each image with concepts in the crop-pest ontology To find all images related to a particular class The usability study was done using on-lin e evaluation. The evaluation compares the new interface to retrieve images with th e crop-pest ontology to a conventional search interface based on the keyword. A preliminary us ability study indicate s that participants met less empty results in the retrieval of images using th e crop-pest ontology. Moreover,

PAGE 92

79 it shows that the image retrieval using the cr op-pest ontology helps us ers to find relevant images by transferring the domain knowledge to them. An information extraction system was de veloped with the crop-pest ontology and phrase patterns as domain-specific knowledge sources. This system provides a semiautomatic approach to the crea tion of indexes based on syntactic and semantic analysis of natural language. The system was then te sted with 150 image captions. One hundred and thirty phrases (86.67%) were parsed and conve rted to semantic structures successfully. Phrase patterns were constructed manually for the information extraction system. This manual building of phrase patterns is very tedi ous. In future work, automatic building of phrase pattern from existing phr ase patterns will be explored in the information extraction system.

PAGE 93

80 APPENDIX A 291 IMAGE CAPTIONS 1: photograph of insect pest 2: photograph of pest of agronomic crops 3: photograph of peanut pest 4: Thrips on peanut photograph 5: Rednecked Peanutworm on peanut photograph 6: Leafhoppers on peanut photograph 7: Hepperburn from leaf hoppers on peanut photograph 8: Three-cornered Alfalfa Hopper on peanut photograph 9: Lesser Cornstalk Borer on peanut photograph 10: Whitefringed Beetle Grub on peanut photograph 11: Southern Corn Rootworm (Spotted Cucu mber Beetle Larva) on peanut photograph 12: Wireworm Larva (Click B eetle grubs) on peanut photograph 13: Corn Earworm on peanut photograph 14: Fall Armyworm on peanut photograph 15: Damage caused by fall armyworm on peanut photgraph 16: Stink Bug on peanut photograph 17: Cutworm on peanut photograph 18: Damage caused by cutworm on peanut photograph. 19: Spider Mites on peanut photograph 20: Southern Armyworm on peanut photograph 21: Velvetbean Caterpillar on peanut photograph 22: soybean pest photograph 23: Lesser Cornstalk Borer on soybean photograph 24: Lesser Cornstalk Borer damage on soybean photograph 25: Whitefringed Beetle on soybean photograph 26: Three-cornered Alfalfa Hopper on soybean photograph 27: Three-cornered alfalfa hopper damage on soybean photograph 28: Velvetbean Caterpil lar on soybean photograph 29: Looper on soybean photograph 30: Green Cloverworm on soybean photograph 31: Beet Armyworm on soybean photograph 32: Fall Armyworm on soybean photograph 33: Corn Earworm on soybean photograph 34: Stink Bug on soybean photograph 35: Damage by Stink Bug on soybean photograph 36: Bean Leaf Beetle on soybean photograph 37: Soybean Stem Borer on soybean photograph 38: Grasshopper on soybean photograph 39: Yellow-striped Armyworm on soybean photograph

PAGE 94

81 40: Mexican Bean Bee tle on soybean photograph 41: Blister Beetle on soybean photograph 42: Snowy Tree Cricket on soybean photograph 43: cotton pest photograph 44: Beet Armyworm on cotton photograph 45: Thrips on cotton photograph 46: Damage by thrips on cotton photograph 47: Tarnished Plant Bug on cotton photograph 48: Damage by tarnished plant bug on cotton photograph. 49: Bollworm on cotton photograph 50: Cotton Aphid on cotton photograph 51: Fall Armyworm on cotton photograph 52: Looper on cotton photograph 53: Cotton Leafworm on cotton photograph 54: Whitefly on cotton photograph 55: Stink Bug on cotton photograph 56: Damage by stink bug on cotton photograph 57: European Corn Borer on cotton photograph 58: Boll Weevil on cotton photograph 59: Cutworm on cotton photograph 60: Spider Mite on cotton photograph 61: Cotton Stainer on cotton photograph 62: White Fringed Beetle on cotton photograph 63: Southern Armyworm on cotton photograph 64: Cotton Fleahopper on cotton photograph 65: Leafminer on cotton photograph 66: Flea Beetle on cotton photograph 67: Sugarcane beetle on cotton photograph 68: Cotton Square Borer on cotton photograph 69: Tobacco Budworm on cotton photograph 70: Thrips damage to peanut leaves. 71: Closeup of adult thrips on peanut leaf. 72: Rednecked peanutworm and damage on peanut. 73: Rednecked peanutworm in peanut bud. 74: Hopperburn caused by leafhoppers on peanut leaves. 75: Closeup on hopperburn caused by leafhoppers on peanut leaf. 76: Overview of hopperburn on peanut caused by leafhoppers. 77: Adventitious root growth on peanut cau sed by three-cornered alfalfa girdling. 78: Lesser cornstalk borer silken feeding tubes on peanut pegs. 79: Lesser cornstalk borer adult mo ths (male left, female right). 80: Closeup of lesser cornstal k borer larva on peanut leaf. 81: Whitefringed beetle grub in so il at base of peanut plant. 82: Spotted cucumber beetle (Southern Corn Rootworm adult) on peanut leaf. 83: Southern corn rootworm (Spotted cucumber beetle larva) on peanut peg. 84: Southern corn rootworm (Spotted cucumb er beetle larva) damage to peanut pod. 85: Wireworm larva on soil in peanut field.

PAGE 95

82 86: Medium corn earworm larva on edge of peanut leaf. 87: Large corn earworm larva on peanut leaf. 88: Profile of small corn earworm larva on edga of peanut leaf. 89: Fall armyworm egg mass on peanut leaf. 90: Small armyworm larva and minor damage on peanut leaf. 91: Hatching fall armyworm egg mass on peanut. 92: Fall armyworm damage to peanut bud. 93: Stink bug egg mass in peanut leaf. 94: Cutworm damage to peanut pod. 95: Spider mites on peanut leaf. 96: Large southern armyworm larva on peanut leaf. 97: Dark phase of velvetbean caterpillar on peanut stem. 98: Dark phase of velvetbean caterpillar on peanut stem. 99: Lesser cornstalk borer la rva with damage on soybean. 100: An adult lesser cornst alk borer moth on the ground beneath a soybean plant. 101: Lesser cornstalk borer la rva on soil beneath soybean. 102: Lesser cornstalk borer damage on soybean. 103: Whitefringed beetle grub in soil under soybean plant. 104: Adult whitefringed beetle with damage on soybean. 105: Adult whitefringed beetle on soybean. 106: Three-cornered alfalfa hopper nymph on soybean. 107: Adult three-cornered alfalfa hopper on soybean stem. 108: Damage caused by three-corn ered alfalfa hopper on soybean. 109: Small velvetbean cate rpillar larvae on soybean. 110: Large velvetbean cate rpillar larva on soybean. 111: Looping velvetbean caterpillar larva on edge of soybean leaf. 112: Adult velvetbean caterpillar moth rest ing on soybean leaf. 113: Velvetbean caterpi llar larva on soybean. 114: Adult velvetbean cate rpillar moth on soybean. 115: Dark phase of velvetbean caterpillar on soybean. 116: Looper larva on soybean leaf. 117: Large looper larv a on soybean leaf. 118: Large looper on soybean leaf. 119: Close-up of large loope r larva on soybean leaf. 120: Large green cloverworm larva on soybean leaf. 121: Green cloverworm larva on soybean leaf. 122: Adult green cloverworm moth on soybean. 123: Beet armyworm larva on soybean. 124: Beet armyworm larva curled up on soybean leaf. 125: Adult beet armyworm moth on soybean leaf. 126: Large beet armyworm larva on soybean. 127: Large fall armyworm larva on soybean. 128: Adult fall armyworm moth on soybean. 129: Small corn earworm larva on edge of soybean leaf. 130: Corn earworm larva on soybean foliage. 131: Corn earworm larva on soybean pod.

PAGE 96

83 132: Close-up of adult corn earworm moth on soybean 133: Stink bug nymphs on soybean pod. 134: Stink bug nymphs on dried soybeans. 135: Southern green stink bug on damaged soybean leaf. 136: Close-up southern green stink bug on soyb ean leaf. Notice the dried soybean laying on leaf for size comparison. 137: Small stink bug nymphs next to egg mass on soybean leaf. 138: Small stink bug nymphs on egg mass on soybean leaf. 139: Southern green stink bug nymph on soybean leaf. 140: Close-up on black stink bug on soybea n. Notice the dried soybean for size comparison. 141: Stink bug egg masses on soybean leaf. 142: Adult brown stink bug on soybean. 143: Soybean pod damage caused by stink bug. 144: Soybean pod damage caused by stink bug on soybean. 145: Adult bean leaf beetle with damage on soybean. 146: Bean leaf beetle adu lt with damage on soybean. 147: Soybean stem borer adult on soybean stem. 148: Soybean stem borer larva in damaged soybean stem. 149: Grasshoppers on soybean leaf. 150: Close-up of grasshopper on soybean stem. 151: Large yellow-striped armyworm on soybean. 152: Mexican bean beetle eggs on underside on soybean leaf. 153: Mexican bean beetle larva on soybean. 154: Pupa of mexican bean beetle on soybean leaf. 155: Adult mexican bean beetle on soybean leaf. 156: Adult blister beet le on soybean leaf. 157: Snowy tree cricket on soybean. 158: Healthy and parasitized beet armyworm larvae with feeding damage on cotton. 159: Large beet armyworm larva behind bract of cotton bloom. 160: Hatching beet armyworm egg mass on cotton. 161: Small beet armyworm feeding in cotton leaf. 162: Beet armyworm larvae feeding in cotton bloom. 163: Beet armyworm pupa on so il at base of cotton plant. 164: Thrips on cotyledon leaf of cotton. 165: Thrips damage to the anthers of a cotton flower. 166: Adult and immature western flower thrips on cotton. 167: Western flower thrips on cotton. 168: Tobacco thrips adult on cotton. 169: Cotton seedling damaged by thrips. 170: Tarnished plant bug on cotton bract. 171: Tarnished plant bug nymph on cotton bract. 172: Two-day old tarnished plant bug egg. 173: Six-day old tarnished plant bug egg. 174: Mid-season plant bug dama ge to white cotton bloom. 175: Damage caused by a tarnishe d plant bug on a pinhead square.

PAGE 97

84 176: Bollworm egg at the base of a cotton square. 177: Bollworm egg on cotton leaf. 178: Small bollworm on a small cotton square. 179: Small bollworm on terminal of cotton plant. 180: Four day old bollworm on cotton. 181: Five to six day old bollworm on cotton. 182: Small bollworm in white cotton bloom. 183: 4-day old bollworm larva in drie d bloom tag with cotton boll damage. 184: Bollworm damage to small cotton boll under bloom tag. 185: Bollworm egg on brown cotton bloom tag. 186: 6-day old bollworm larv a in white cotton bloom. 187: 6-day old bollworm on small cotton boll. 188: Small bollworm on cotton square showing damage. 189: Bollworm feeding through the wh ite bloom into the cotton boll. 190: 4to 5-day old bollworm larva under the cotton bloom tag with boll damage. 191: Old boll tip feeding damage from a bollworm on a large boll. 192: Old boll tip feeding damage fr om a bollworm on a mature boll. 193: Bollworm adult moth on cotton leaf. 194: Large bollworm larva on cotton stem. 195: Closeup of head and thorax of a bollworm larva. 196: Large bollworm larva under cotton bloom tag. 197: Bollworm egg on dried cotton bloom. 198: Bollworm eggs on cotton leaf. 199: Cotton bollworm larva on boll in Bt cotton. 200: Bollworm egg on brown bloom tag. 201: Bollworm larva feeding in cotton bloom. 202: Bollworm larva under brown bloom tag. 203: Cotton aphids on cotton leaf. 204: Cotton aphid honey dew and cupped cotton leaves. 205: Cotton aphids with fungus disease. 206: Sooty mold on cotton lin t caused by aphid honeydew. 207: Four day old fall armyworm larva feeding on boll bract. 208: Large fall armyworm larva behind the bract of white cotton bloom. 209: 3-day old fall armyworm larva on cotton boll bract. 210: Fall armyworm on small cotton square with damage. 211: Fall armyworm damage to cotton boll bract. 212: Small fall armyworm on cotton square in top of plant. 213: Fall armyworm egg mass on cotton leaf. 214: Small fall armyworm in white cotton bloom. 215: Small fall armyworm larva with f eeding damage to cract calyx and boll. 216: Small fall armyworm larva with feeding damage on bract. 217: Bract etching by fall armyworm. 218: Fall armworm larva bor ing into cotton stem. 219: Fall armyworm larva feeding in white bloom. 220: General view of looper feeding damage on cotton. 221: Soybean Looper on cotton leaf.

PAGE 98

85 222: Soybean looper feeding on cotton leaf. 223: Soybean looper pupa on a cotton leaf. 224: Cabbage looper on cotton leaf. 225: Large soybean looper larva on cotton leaf. 226: Large cabbage looper larva on cotton leaf. 227: Soybean looper pupa on cotton leaf. 228: Soybean looper la rva on cotton leaf. 229: Cotton leafworm larva with feeding damage on cotton. 230: Cotton leafworm larva and damage. 231: Cotton leafworm larva with damage. 232: Banded-winged whitefly adults and eggs on cotton. 233: Banded-winged whitefly pupae on cotton leaf. 234: Sweet potato whitefly on th e underside of a cotton leaf. 235: Closeup of adults and eggs of the sweet potato whitefly. 236: Closeup of sweet potato larvae and pupae on cotton. 237: Silverleaf whitefly on cotton leaf. 238: Southern green stink bug on cotton leaf. 239: Southern green stink bug on co tton boll with feeding damage. 240: Stink bug injury to cotton boll. 241: no description 242: Wart on inside of cotton boll from stink bug injury 243: Outside boll blemish and brown lint from stink bug damage. 244: Boll rot caused by stink bug feeding. 245: Stink bug damaged boll versus normal open boll. 246: Southern green stink bug nymph and damaged boll. 247: Southern green stin k bug nymph on cotton leaf. 248: Adult stink bug feeding on cotton boll. 249: Southern green stink bug adult feeding on cotton boll. 250: Stink bug damage to a 4-day old boll. 251: Dissected 4-day old boll showing internal stink bug damage. 252: no image caption 253: Pinned specimens of adult european corn borer moths. 254: European corn borer egg mass on leaf. 255: European corn borer larva in cotton stem. 256: European corn borer larva in cotton boll. 257: European corn borer larva and boll damage. 258: European corn borer damage to cotton bolls. 259: Boll weevil on cotton boll. 260: Boll weevil pupa in cotton square. 261: Cotton square punctured by boll weevil. 262: Cutworm on soil curled in C-shape. 263: Cutworm in soil at base of cotton plant. 264: Cutworm in soil at the base of cotton planted in wheat stubble. 265: Cutworm in soil with damaged cotton plant. 266: Spider mite damage to cotton leaf. 267: Spider mites on underside of cotton leaf.

PAGE 99

86 268: Cotton stainer adult in a white cotton bloom. 269: Cotton stainer nymphs on rotted cotton boll. 270: Cotton stainer nymph on cotton leaf. 271: White fringed beetle grub and damage to cotton seedling. 272: White fringed beetle grub damage to cotton. 273: Closeup of white fringed bee tle grub feeding damage to cotton 274: Above ground symptoms of white fringed beetle grub feeding. 275: Southern armyworm egg mass on cotton leaf. 276: Small southern armyworm larvae on cotton leaf with feeding damage. 277: Southern armyworm (early in star) on cotton leaf with damage. 278: Late instar southern armyworm on cotton leaf. 279: Several color variations of late instar southern armyworms. 280: Large southern armyworm larva on cotton leaf. 281: Southern armyworm larva on cotton leaf. 282: Small southern armyworm larvae with damage. 283: Large southern armyworm larvae showing color variations. 284: Cotton fleahopper on cotton leaf. 285: Leafminer damage to cotton leaf. 286: Leafminer damage to cotton. 287: Flea beetle and da mage to cotton leaf. 288: Sugarcane beetle larv a on boll in Bt cotton. 289: Sugarcane beetle on finger. 290: Cotton square borer larva and damage. 291: Tobacco budworm moth on cotton leaf possibility of picture frames

PAGE 100

87 APPENDIX B A LIST OF WORDS APPEARI NG ON 291 IMAGE CAPTIONS head 1 calyx 1 silverleaf whitefly 1 internal 1 normal 1 pinhead bollworm 1 parasitized 1 boll cutworm 1 resting 1 tubes 1 above 1 immature 1 field 1 open 1 medium 1 armyworms 1 3-day 1 girdling 1 etching 1 finger 1 planted 1 dew 1 versus 1 stubble 1 laying 1 brown stink bug 1 inside 1 several 1 cupped 1 view 1 soybean. 1 foliage 1 masses 1 disease 1 six 1 right) 1 beet 1 pinned 1 dissected 1 wart 1 boring 1 female 1 six-day 1 large yellow-striped armyworm 1 an 1 top 1 tobacco 1 rot 1 five 1 41 flea 1 fall 1 outside 1 black stink bug 1 cornstalk 1 rotted 1 5-day 1 edga 1 caterpillar 1 c-shape 1 minor 1 cract 1 growth 1 plant bug 1 bolls 1 hepperburn 1 aphid 1 velvetbean 1 next 1 specimens 1 anthers 1 adventitious 1 pegs 1 wheat 1 fall armworm 1 blemish 1 sooty 1

PAGE 101

88 green 1 agronomic 1 leafminer 1 thorax 1 two-day 1 grasshoppers 1 mature 1 left 1 insect 1 overview 1 sweet potato whitefly 1 cotton bollworm 1 early 1 grubs 1 general 1 profile 1 cotyledon 1 terminal 1 weevil 1 punctured 1 three-cornered alfalfa 1 mold 1 male 1 symptoms 1 root 1 through 1 borer 1 crops 1 honey 1 looping 1 fungus 1 mid-season 1 healthy 1 photgraph 1 up 1 honeydew 1 silken 1 lesser 1 soybeans 1 peg 1 flea beetle 1 yellow-striped armyworm 1 moths 2 cotton square borer 2 grasshopper 2 snowy tree cricket 2 cotton fleahopper 2 western 2 beetle 2 variations 2 cotton aphid 2 cotton aphids 2 seedling 2 sweet potato 2 injury 2 pupae 2 spider mite 2 tip 2 comparison 2 into 2 whitefly 2 6-day 2 cotton leafminer 2 beneath 2 tobacco budworm 2 late 2 behind 2 southern 2 bud 2 ground 2 blister beetle 2 lint 2 curled 2 bt 2 wireworm 2 for 2 four 2 banded-winged whitefly 2 color 2 size 2 notice 2 adults 2 cabbage looper 2 hatching 2 rednecked peanutworm 3 phase 3 showing 3 underside 3 spider mites 3 armyworm 3 bean leaf beetle 3 hopperburn 3

PAGE 102

89 edge 3 flower 3 leaves 3 instar 3 boll weevil 3 dark 3 soybean stem borer 3 4-day 3 day 3 sugarcane beetle 3 southern corn rootworm 4 cotton leafworm 4 spotted cucumber beetle 4 eggs 4 green cloverworm 4 cotton stainer 4 brown 4 pest 5 base 5 leafhoppers 5 under 5 at 5 white fringed beetle 5 nymphs 5 dried 5 close-up 5 mexican bean beetle 5 from 5 pupa 5 soybean looper 5 three-cornered alfalfa hopper 6 nymph 6 whitefringed beetle 6 pod 6 damaged 6 southern green stink bug 7 european corn borer 7 larvae 7 grub 7 tarnished plant bug 7 closeup 7 cutworm 7 stink 7 tag 7 white 8 looper 8 plant 8 bug 8 square 8 moth 9 bract 9 soil 9 lesser cornstalk borer 9 corn earworm 9 stem 9 mass 9 velvetbean caterpillar 10 beet armyworm 11 southern armyworm 11 thrips 11 the 12 caused 12 old 14 stink bug 15 a 15 egg 16 and 17 with 18 large 18 by 19 bloom 19 fall armyworm 20 small 21 to 22 feeding 22 in 26 bollworm 26 adult 26 boll 31 of 43 peanut 47 damage 57 larva 59 photograph 68 leaf 69 soybean 82 cotton 124 on 217

PAGE 103

90 APPENDIX C 138 TERMS USING THE EVALUATION OF THE CROP PEST ONTOLOGY AND THE AGROVOC ONTOLOGY adult adult bollworm agrotis spp. along Red imported fire ant associated attacking base beet armyworm before big-eyed bug nymph bluish-green boll boll damage boll feeding boll rot boll weevil bollworm bollworm damage bollworm egg bract brown cotton brownish bud bug feeding cabbage looper caterpillar caused causes cocoon color variation coloration conchuela convergent cotton cotton aphid cotton fleahopper cotton leafperforator cotton leafworm cotton square Cotton square borer curl cutworm cycle damage damsel bug dark dark colored dark spots destruction egg egg mass evidence excrement excreted exit exposed exposed egg fall armyworm feeding flared cotton fleahopper flimsy foliage damage forming free freshly laid bollworm grasshopper green green lacewing larva hole honeydew immature inside lady beetle large

PAGE 104

91 larva leaf leaf mining leafhopper assassin bug leafperforator leafroller cotton leafworm left life life cycle lint mass midveins minute growing newly nymph omnivorous pale pink bollworm pirate bug plant plant bug puncture pupation red reddening reduced rot saltmarsh caterpillar seeding severe showing skeletonizing slender small small larva Sooty mold spider mite square stand stink bug stink bug adult cotton strainer surface tan moth tarnished plant bug tarnished plant bug adult terminal thrips tobacco budworm tobacco budworm larva typical underside upward view white whitefly world yellow yellow-striped armyworm yellowing

PAGE 105

92 APPENDIX D THE CROP-PEST ONTOLOGY WRITTEN IN OWL the process or condition of decaying or a decayed area noun fall armyworm on cotton phot ograph

PAGE 106

93
tobacco thrips cotton fleahopper a flower that has not yet opened the process of laying explosive mines Attributes that apply specifically to instances of Organism plant group found in a particular countr y, region, or time

PAGE 107

94 grasshopper
a thing that has a location in space-time. Note that locations are themselves understood to have a location in space-time the end of objects growth on plant: any abnormal growth that looks like a wart and is found on a plant brown lint mexican bean beetle grub

PAGE 108

95 looper pupa
any of the flat green parts that grow in various shapes from the stems or branches of plants and trees and whose main function is photosynthes sweet potato whitefly larva a member of a set of words used in close connection with, and us ually before, nouns and pronouns to show their relation to some ot her part of a clause. any of the digits of the hand, sometimes excluding the thumb

PAGE 109

96 atribute about organism's health
color variation the period between birth and the present time of a relatively little size developmental stage of insect, usually egg,larva, pupa and adult any Object that does not consist of two or more disconnected parts Any Sel fConnected Object that expre sses information

PAGE 110

97
corn earworm moth velvetbean caterpillar egg Reproductive structure of Organisms. Consists of an Embryonic Obje ct and a nutritive/protective envelope. Note that this class includes seeds, spores, and Fr uitOrVegetables, as well as the eggs produced by Animals. any change in bodily function that is experien ced by a patient and is associated with a particular disease the long narrow outer case holding the seeds of a plan t such as the pea, bean, or vanilla. cotton aphid with disease

PAGE 111

98
flea beetle on cotton photograph Something or someone that can act on its own and produce ch anges in the world The class of temporal durations (instances of TimeDuration) and positions of TimePoints and TimeIntervals along the universal timeline (ins tances of TimePosition). agronomic crop pest photograph pest photograph on peanut flea beetle stink bug

PAGE 112

99
a sweet sticky substance deposited on leaves by aphids and certain other in sects as a by-product of the juices they suck from plants cotton fleahopper on cotton photograph 6 western flower thrips cutworm

PAGE 113

100 An Organism having cellulose cell walls, growi ng by synthesis of Substances, generally distinguished by the presence of chlorophy ll, and lacking the power of locomotion.
fall armyworm larva boring A collection of Cells and Tissues which are localiz ed to a specific area of an Organism and which are not pathological green cloverworm threecornered alfalfa hopper silverleaf whitefly

PAGE 114

101 spider mite on cotton photograph
the elongated portion of the body of an arthropod, lo cated behind the thorax. It is usually segmented feeding on plant part end of cotton boll southern green stink bug nymph

PAGE 115

102 bodyPart in animal
the process which make injury or damage spider mite process of eat something the act of drilling Qualities which we cannot or choose not to reify into subc lasses of Object

PAGE 116

103 the axis of the skeleton of a vertebrate animal, extendi ng from the head and consisting of a series of interconnected vertebrae that enclose and protect the spinal cord
The Class of VisualAttributes relating to the color of Objects soybean stem borer larva cotton aphid on cotton photograph the subclass of Content Beari ng Objects which are language -related the topmost part of a insect body, where the br ain, eyes, nose, ears, mouth, and jaws are situated

PAGE 117

104 sugarcane beetle larva
velvetbean caterpillar pupa an image produced on light-sensitive film inside a camera or digital camera A PhysicalQuantity is a measure of some quantifiable aspect of the modeled world, such as 'the earth's diameter' sweet potato whitefly egg stink bug on cotton photograph beetle

PAGE 118

105 a lowgrowing annual plant of the legume family w hose seeds are contained in pods that are forced underground as they grow. An Organism with eukaryotic Cells, and lacking stiff cell walls, plastid s, and photosynthetic pigments. A normal or pathological part of the anatomy or structural organization of an Organism. This class covers BodyParts, as well as stru ctures that are given off by Organisms, e.g. ReproductiveBodies

PAGE 119

106 thrips on cotton photograph body part in cotton green cloverworm larva Corresponds roughly to the class of ordinary objects. Examples include normal physical objects, geographical regions, and locations of Processes, the complement of Objects in the Physical class

PAGE 120

107 photograph about insects
human body part Any Attribute whose presence is detected by an act of Perception. process of the process of developing, developing something, or of being developed, for example, by growth or metamorphosis european corn borer moth

PAGE 121

108
body part of the bottom of bloom in cotton developmental stage cotton which has Bt toxins for ins ect tolerance comparatively big in size, number, or quantity, or bigge r in size, number, or quantity than is usual or expected early phase quantity which are based on content such as

PAGE 122

109
a word that modifies a verb, an adjective, another ad verb, or a sentence, fo r example, ?happily,? ?very,? or ?frankly? disease caused by fungus the intensity of light reflected or gi ven off by somethi ng the leaves of a plant or tree a developnental process to cause a young orga nism in insect to emerge from its egg

PAGE 123

110
white fringed beetle boll weebil on cotton photog raph the bottom or lowest part white fringed beetle on cotton photograph Processes which involve altering an internal property of an Object, e.g. the shape of the Ob ject, its coloring, its structure, etc. Processes that are not instances of this cl ass include changes that only

PAGE 124

111 affect the relationship to ot her objects, e.g. changes in spatial or temporal location.
cutworm on peanut photograph tarnished plant bug egg the highest part or poi nt living organism that differs: a living organism th at differs from the normal form for its kind inspecting something brown stink bug

PAGE 125

112
something produced as a secondary result of production of something else tobacco budworm on cotton photograph a single-celled, often parasitic microorganism without distinct nucle i or organized cell structures. Various species are responsible for decay, fermentation, nitrogen fixation, and many plant and animal diseases.

PAGE 126

113 an invertebrate animal that has jointed limbs, a segmented body, and an exoskeleton made of chit
An Object in which every part is similar to every other in every relevant respect. More precisely, something is a Substance when it ha s only arbitrary pieces as parts any parts have properties which are similar to those of the whole. Note that a Substance may nonetheless have physical propert ies that vary. For example, the temperature, chemical constitution, density, etc. may change from one part to anot her. An example would be a body of water. A Process embodied in an Organism an annual grass, native to southwestern Asia and the Mediterranean, some types of which are widely cultivated in temperate regions for their edible grains. The numerous varieties of cultivated wheat are based on three main species: bread wheat, durum or hard wheat, and emmer. Genus Triticum.

PAGE 127

114
The Class of visually discerni ble properties feeding caused by stink bug An Animal which has a spinal column. soybean body part anything that lasts for a time but is not an Object feeding caused by pest the series of changes of form and activity that a living organism undergoes from its beginning through its development to sexual maturity

PAGE 128

115 body part of plant the main axis of a plant that bears buds and shoot s. It is usually a bove ground, although some plants have underground stems green stink bug looper on cotton photograph Attributes characterizing the orientation or position of an Object A negative or nonnegative whol e number.

PAGE 129

116 bollworm on cotton photograph
the amount, scope, or degree of something, in terms of how large or small it is a flower, especially on a plant cultivated chiefly for its flowers square in cotton corn earworm

PAGE 130

117 to sustain or cause a small hole or wound in something such as the skin
injury or harm from agent beet armyworm feeding feeding damage sugarcane beetle on cotton photograph european corn borer

PAGE 131

118
mexican bean beetle egg fall armyworm feeding small Arthropods that are air-breathing and th at are distinguished by appearance. Any Attribute of an Entity that is an internal property of the Entit y, e.g. its shape, its color, its fragility, etc.

PAGE 132

119
Collections have members.They have a position in space-time and members can be added and subtracted without thereby changing the identity of the Collection. Some examples are toolkits, football teams, and flocks of sheep. a detailed view or examination of something attribute of organism's age Any measure of length of time, with or without respect to the universal timeline a process that alters or interferes with a normal process, state or activity of an Organism

PAGE 133

120 soybean looper pupa snowy tree cricket true true

PAGE 134

121
blister beetle a clearly distinguishable period or stage in a process, in the devel opment of something, or in a sequence of events body part of insect borer Any Number that can be expressed as a (p ossibly infinite) decimal tobacco budworm moth

PAGE 135

122
european corn borer on cotton photograph true true dark phase

PAGE 136

123 the group of sepals, usually green, around the outside of a flower that encloses and protects the flower bud
Any Attribute that an Entity has by virtue of a relati onship that it bears to another Entity or set of Entities cotton stainer on cotton photog raph a singlecelled organism such as an amoeba that can move and feeds on organic compounds of nitrogen and carbon. soybean stem borer egg

PAGE 137

124
a southeastern Asian plant cult ivated around the world for its nutritious seeds, for soil improvement, and to provide grazing for animals abnormal growth A SelfConnectedObject whose parts have properties that are not shared by the whole in the life cycle of an arthropod such as an insect, a stage between two successive molts Any RealNumber that is the product of dividing two Integers tarnished plant bug nymph

PAGE 138

125
tobacco budworm sugarcane beetle 6 soybean looper

PAGE 139

126
feeding caused by bollworm the length of time that insect has existed, usually expressed in days the outline of something?s form cotton leafworm on cotton photograph object which consists of environment the silky covering with which a caterpillar or other insect larva enclos es itself during its transition to an adult state

PAGE 140

127
stink bug egg Any TimePoint or TimeInterval along the unive rsal timeline from NegativeInfinity to PositiveInfinity a mark or imperfection that spoils the app earance of something green cloverworm moth

PAGE 141

128 the length of time that somebody or some thing has existed
yellow striped armyworm cabbage looper black stink bug bandedwinged whitefly egg pest which is not an organism such as virus and prion

PAGE 142

129
true true velvetbean caterpillar moth Properties or qualities as distinguished from any particular embodiment of the properties/qua lities in a physical medium. Instances of Abstract can be said to exist in the same sense as mathematical objects such as sets and relations, but they cannot exist at a particular place and time without some physical encoding or embodiment cotton square borer on cotton photograph the act of etching

PAGE 143

130 pest photograph on soybean
soybean stem borer A measure of how many things there are, or how much there is, of a certain kind. Numbers are subclassed into RealNumber, Co mplexNumber, and ImaginaryNumber. An nonOrganism consisting of a core of a single nuc leic acid enclosed in a protective coat of protein. A virus may replicate only inside a hos t living cell. A virus exhibits some but not all of the usual characteristics of living things.

PAGE 144

131
european corn borer larva cotton aphid a sequence of events that is repeated again and again, especially a causal sequence southern armyworm on cotton phot ograph whitefly

PAGE 145

132 egglarva-pupa-adult
cutworm on cotton photograph threecornered alfalfa egg a male flower part, the top part of a stame n, that bears the pollen in pollen sacs. feeding on cotton body part objects which encompass Organisms and Cor puscularObjects that are parts of Organisms

PAGE 146

133 bandedwinged whitefly
threecornered alfalfa nymph An Animal which has no spinal column leafminer armyworm bandedwinged whitefly larva soybean stem borer pupa

PAGE 147

134
bollworm egg An interval of time corn earworm on peanut photograph tarnished plant bug sweet potato whitefly a rounded seedpod or capsule, especially of cotton

PAGE 148

135 the surface of the land (soil)
tarnished plant bug on cotton photograph of middling size or dimensions, neith er large nor small iundicate part of orgranic object which is not anatomic structur e i.e. edge and base

PAGE 149

136
a tropical or subtropical bush producing soft white downy fibers and oil-rich a modified leaf that arises from the stem at the point where the flower or flower cluster develops in cotton a living thing such as a plant, animal, or bacterium stink bug nymph

PAGE 150

137
A term of a Language that represen ts a concept cotton stainer nymph pest photograph on cotton the product of a Making by organism etching on plant body part a process to develop from a la rva into a pupa

PAGE 151

138
thrips Any specification of how many or how much of something there is. soybean stem borer adult bollworm southern green stink bug something that is

PAGE 152

139 representative because it is typical of its ki nd or of a whole, especially something that serves as an example
boll weevil pupa cotton stainer european corn borer egg the first leaf, or one of the first pair of leaves, produced by the seed of a flowering plant. They may serve as food stores, remaining in th e seed at germination, or produce food by photosynthesis. bean leaf beetle

PAGE 153

140 any group of plants grown by people for food or other use
white fringed beetle grub short stalks left in the ground after a grain crop has been harvested the top layer of most of the earth?s land surface, consisting of the unconsolidated products of rock erosion and organic decay, along with b acteria and fungi ,: the place where plants live whitefly on cotton photograph looper

PAGE 154

141
true a period of 24 hours, usually beginning and ending at midnight adult stage of bollworm bollworm pupa a singlecelled or multicellular organism without chlo rophyll that reproduces by spores and lives by absorbing nutrients from organic matter. Fungi include mildews, molds, mushrooms, rusts, smuts, and yeasts

PAGE 155

142
velvetbean caterpillar A ConstantQuantity is a PhysicalQuantity which has a constant value, e.g. 3 meters and 5 hours. The magnitude (see MagnitudeFn) of every ConstantQuantity is a RealNumber. The magnitude (see MagnitudeFn) of every ConstantQuantity is a RealNumber. The magnitude (see MagnitudeFn) of every ConstantQuantity is a RealNumber. the fibers that surround unprocesse d cotton seeds sweet potato pupa photograph taken with digital camera

PAGE 156

143 a line or area that is the outermost part or the part farthest away from the center of organic object
young developing cotton that is grow n from a seed an organism that cause any events such as damage to plant the lower side or bottom of something a collection of egg overall view

PAGE 157

144 boll weevil the middle division of the body of an insect dark fungi that grow on honeydew excreted by suckin g insects or on exudates from leaves of certain plants attribute of organism's maturity


PAGE 158

145 APPENDIX E QUESTIONAIRE ON EVALUATION General questions 1. What is your name? 2. what is your occupation? Please choose: --choose-3. How often do you search information online? Please choose: --choose-4. How often do you search images online? Please choose: --choose-5. Do you know crops, insects, and relationships between them? Please choose: strongly agree agree disagree strongly disagree n/a S ubmit questions on subject

PAGE 159

146 1. What is your name? 2. What type of interface did you test? Please choose: A. Keyword-based interface such as Google B. graphical interface 3. How many empty results were found when using this interface? Please choose: none 1-3 4-6 5-9 more than 9 n/a 4. When you get an empty result, did this interface help you find another way to retrieve? Please choose: strongly agree agree disagree strongly disagree n/a 5. Did this interface help you learn more about crops, pests, and relationships between them? Please choose: strongly agree agree disagree strongly disagree n/a 6. Did this interface help you find accurate images? Please choose: strong agree

PAGE 160

147 7. Did this interfacce help you retrieve images easily? Please choose: strongly agree agree disagree strongly disagree n/a 8. This interface showed terms related with images. Did the terms help you to understand relationships between crops and pests? Please choose: strong agree agree disagree strongly disagree n/a I did not see terms related with images. 9. Comments on this interface: S ubmit

PAGE 161

148 REFERENCES Abasolo, J. and M. Gmez. MELISA: An ontol ogy-based agent for information retrieval in medicine. ECDL 2000 Workshop on the Semantic Web, Lisbon, Portugal, Sept. 2000. Agricultural Ontology Service (AOS), The website of AOS project, 41655875H H H H112Hhttp://www.fao.org/agris/ao s/Applicati ons/intro.htm 2005 Agricultural Research Service (ARS) photo ga llery, a website of phot o gallery in ARS, 42665976H H H H113Hhttp://www.ars.usda.go v/is/graphics/photos/ 2005 AGRICOLA, a website of AGRICOLA, 43676077H H H H114Hhttp://agricola.nal.usda.gov/ 2005 AGROVOC ontology, a link to download AGROVOC ontology, 44686178H H H H115Hhttp://kaon.semanticweb.org/Memb ers/rvo/ontologies/AGROVOC.zip 2005 Allen J. Natural language understanding. 2nd ed. Redwood city, CA. The Benjamin/Cumming Publishing Company, 1995 AOS, Agricultural Ontology Service(AOS), 45696279H H H H116Hhttp://www.fao.org/agris/aos/ 2005 ASK. 2003, Ask Jeeves. Ask Jeeves incorpor ation. Available at: ask.com Accessed on 12th May, 2003 Blair C. D., Maron M. E., An evaluation of retrieval effectiveness for a full-text document-retrieval system, Communicat ions of the ACM, 28:289-299, 1985 Bernard M. L., Ingwersen P., The devel opment of a method for the evaluation of interactive information retrieval system s, Journal of Documentation, 53 (3): 225250, 1997 Brewster, C., Alani, H., Dasmahapatra, S., W ilks, Y, Data driven ontology evaluation. International Conference on Language Res ources and Evaluation, Lisbon, Portugal, 2004 Cardie C., Empirical Methods in Inform ation Extraction. AI Magazine, 18(4), 1997 Chun, C., Wenlin L.,From agricultural thesauru s and ontology, In Proc. Fifth Agricultural Ontology Service (AOS) Workshop. Beijing, China, 2004 46H117Hhttp://www.fao.org/agris/aos/ConferencesW/ FifthAOS_China04/AOS_Proceedings/docs/1-3.pdf

PAGE 162

149 Dalmau M., Floyd R., Jiao Dazhi, Riley J., Inte grating thesaurus relati onships into search and browse in an online photograph colle ction, Library Hi Tech 23(3): 425-452, 2005 Desmontils E., Jacquin C., Indexing a web s ite with a terminology oriented ontology, The Emerging Semantic Web. IOS Press, pages 181198, 2002. Egothor, the official website of the Egother, http://www.egothor.org/, 2005 Electronic Data Information Source (EDIS), a website of EDIS system in Institute of Food and Agriculture Science, 48726581H H H H118Hhttp://edis.ifas.ufl.edu 2005 Encarta dictionary, the website of Encarta dictionary, 119Hhttp://encarta.msn.com/encnet/features/dictionary/dictionaryhome.aspx 2005 Enser P. G., Query analysis in a visual information retrieval context, Journal of Document and Text Management, 1: 25-52, 1993 Hersh, S. P., Donohoe L., Assessing thesauru s-based query expansion using the UMLS metathesaurus, Journal of American Medical Informatics Association, 2000. Gomez, P., Evaluation of taxonomic knowledge in ontologies and knowledge base, Banff Knowledge Acquisition for Knowledge-Bas ed Systems, KAW'99, University of Calgary, Alberta, Canada, 1999 Google Image Search, the web site of Google image search interface, 50746783H H H H120Hhttp://www.google.com/imghp?hl=en&tab=wi&q =, 2005 Grosky W. I., Multimedia information systems. IEEE Multimedia, 1(1):12–24, 1994. Gruber, T.R. Towards principles for the de sign of ontologies used for knowledge sharing, in Formal ontology in conceptual analys is and knowledge representation, N. Guarino and R. Poli, Editors, Kluwer Academic Publishers. 1993 Gruber, T. R. A translation approach to portable ontologies. K nowledge Acquisition, 5(2): 199-220. 1993 Hyvnen E., S. Saarela, K. Viljanen: Ontol ogy based image retrieval. Proceedings of WWW2003, Budapest, poster papers, 2003. Insect Images, the website of the Insect Images, 51756884H H H H121Hhttp://www.insectimages.org/ 2005 Jacob, K. E. Ontologies and the semantic web. Bulletin of American Society for Information Science and Technology. April/May 2004:19-21. Kalyanpur A., Hashmi N., Golbeck J., Pars ia B., Lifecycle of a casual web ontology Development Process, The Proceedings of the World Wide Web Conference, 2004

PAGE 163

150 Kaplan, R., A general syntactic processor, In Rustin, R. (Ed.), Natural Language Processing, Englewood Cliffs, N. J. Prentice-Hall, 1973 Kay, M., Algorithm schemata and data structur es in syntactic processing, Xerox, Palo Alto Research Center, 1980 Kay, M., The mind system, In Rustin, R. (Ed.), Natural Language Processing, Englewood Cliffs, N. J. Prentice-Hall, 1973 Kohler J., Schulze-Kremer S., The Semantic Metadatabase (SEMEDA): Ontology-based integration of feferated molecular biologi cal data source, In Silico Biology 2, 0021, 2002 Koskela M., Laaksonen J., Laakso S., a nd Oja E.. The PicSOM retrieval system: description and evaluations. In The challe nge of image retrieval, Brighton, UK, 2000, http://www.cis.hut.fi/picsom/publications.html. Lauser B, Wildemann T., Poulos A., Fisseha F., Keizer J., Katz S.,A Comprehensive framework for building multilingual doma in ontologies: creating a prototype biosecurity ontology, Proc. Int. Conf. on Dublin Core and Metadata for eCommunities 113-123, 2002 Maedche A., Neumann G. and Staab S, B ootstrapping an ontologybased information extraction system. Studies in Fuzzine ss and Soft Computing, Springer, 2002 Matthews, B. M., Miller, K., Wilson, M. D., A thesaurus interchange format in RDF. In Proc. The 1st International Semantic Web Conference (ISW C2002). June 9-12th, 2002. Sardinia, Italia. http://www.limber.rl.ac.uk/External/SW_conf_thes_paper.htm Muller, H. M., Kenny, E. E., Sternberg, E.W .,Textpresso: An ontology-based information retrieval and extraction system for biologi cal literature. PLoS Biology 2(11): e309. 2004. Niles, I., and Pease, A., Towards a Standard Upper Ontology. In Proceedings of the 2nd International Conference on Formal On tology in Information Systems (FOIS2001), Chris Welty and Barry Smith, ed s, Ogunquit, Maine, October 17-19, 2001. Noy, N. F. and Deborah L. McGuinness. Ontology development 101: a guide to creating your first ontology, Stanford Knowledge Systems Laboratory Technical Report KSL-01-05, March 2001. Olston C., Chi C., ScentTrails: Integrat ing browsing and searching on the web, ACM transactions on computer-human interacti on: a publication of the Association for Computing Machinery. 10 (3):177, 2003 Ontology of Science, the website of describing science ontology, 52766985H H H H122Hhttp://protege.stanfor d.edu/plugins/owl/owl-l ibrary/tambis-full.owl 2005

PAGE 164

151 OWL, the OWL Web ontology language guide, 53777086H H H H123Hhttp://www.w3.org/TR/2004/REC-owlguide-20040210/ 2005 Pellet OWL reasoner, a official website of Pellet reasoner, 54787187H H H H124Hhttp://www.mindswap.org/2003/pellet/ 2005 Phillips W., Riloff E., Exploiting strong synt actic heuristics and co-training to learn semantic lexicons, Proceedings of th e 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), 2002 Plant Diagnostic Information Syst em (PDIS), a website of PDIS, 55797288H H H H125Hhttp://www.pdis.org/ 2005 Parsia B., Sirin, E., Kalyanpur, A., 9HDebugging owl ontologies. In The 14th International World Wide Web Conference (W WW2005), Chiba, Japan, May 2005. Protg, the website of protg ontology editor, 57817490H H H H126Hhttp://protege.stanford.edu/ 2005 Rosch E., Principles of cat egorization, cognition and categ orization, R. E. and B. B. Lloyd, editors, Hillside NJ, Lawren ce Erlbaum Publishers: 27-48, 1978 Salton G., McGill M., Introduction to modern information retrieval, McGraw-Hill, New York, NY, 1983. Semantic Web, the website of the Semantic Web, 58827591H H H H127Hhttp://www.w3.org/2001/sw/ 2005 Seiffert R., Chart-parsing of unification-ba sed grammars with id/lp-rules. LILOG-Report 22: 1-19, 1987 Taghva K., Borsack J., Condit A., the effectiven ess of thesauri-aided retrieval. in proc. is&t/spie 1999 Intl. Symp. on Electroni c Imaging Science and Technology, San Jose, CA, January 1999. Texas Agricultural Extension Service, Th e website of showing 95 image captions provided by Texas Agricultural Extension Service, 59837692H H H H128Hhttp://insects.tamu.edu/exte nsion/bulletins/imagesb-933.html 2005 Technical Advisory Service for Image (TAS I), A review of image search engines, 60847793H H H H129Hhttp://www.tasi.ac.uk/res ources/searchengines2003.html 2005 TouchGraph, the official website of TouchGraph, 61857894H H H H130Hhttp://www.touchgraph.org 2005 Tsinaraki C., Polydoros P., Kazasis F., Ch ristodoulakis S.,Ontology-based semantic indexing for mpeg-7 and tv-anytime audi ovisual content, Special issue of Multimedia Tools and Application Journal on Video Segmentation for Semantic Annotation and Transcoding, 26, 299-325, Aug. 2005

PAGE 165

152 UMLS Metathesaurus, the homepage of the UMLS Metathesaurus, 62867995H H H H131Hhttp://www.nlm.nih.gov/pubs /factsheets/umlsmeta.html 2005 Uschold M., Gruninger, M., Ontologies: Principles, methods and applications. Knowledge Engineering Review, 11(2), 1996 Wielinga B., Schreiber, G., Wielemaker, J .,Sandberg ,J. A. C., From thesaurus to ontology, Internation Confer ence on Knowledge Captur e, Victoria, Canada, October 2001 Yee K., Swearingen K., Li K., Hearst M., Faceted Metadata for Image Search and Browsing, In Proceedings of the conf erence on Human factors in computing systems, 2003.

PAGE 166

153 BIOGRAPHICAL SKETCH I was born in a small town near Seoul, S outh Korea in 1973. I got bachelor degree of agriculture from the Department of Agri cultural Biology in Ko rea University, South Korea. I got my master of science from the Department of Microbiology in Seoul National University, South Korea. I will ge t the Ph.D. degree in the Department of Agricultural and Biological Engineering in University of Florida on December, 2005. I have one daughter, Bonny Koo.


Permanent Link: http://ufdc.ufl.edu/UFE0013105/00001

Material Information

Title: Using a Crop-Pest Ontology to Facilitate Image Retrieval
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0013105:00001

Permanent Link: http://ufdc.ufl.edu/UFE0013105/00001

Material Information

Title: Using a Crop-Pest Ontology to Facilitate Image Retrieval
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0013105:00001


This item has the following downloads:


Full Text












USING A CROP-PEST ONTOLOGY TO FACILITATE IMAGE RETRIEVAL


By

SOONHO KIM

















A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


2005





























Copyright 2005

by

Soonho Kim

































To my father and mother.
















ACKNOWLEDGMENTS

My first thanks go to my adviser, Dr. Howard W. Beck. His kindness, enthusiasm,

and support made all the difference in my academic career. I had no background

knowledge about agricultural information before I met him. He has been teaching several

basic components from scratch that are needed for my research such as Java, concepts of

object-oriented database management system, artificial intelligence, and ontologies.

Every Friday for 5 five years, Dr. Beck meets me to discuss my research and gives

advising to me. I could not finish my dissertation without him.

My thanks also go to Dr. James J. Jones, who served on my committee. I learned

how to build a biological system from his course. He taught me a participatory design to

develop an application for agricultural users. Whenever I implemented applications on

my research, I kept in mind of considering users and communicating with users.

I thank Dr. Fedro Zazueta, who served on my committee. He made my eye to open

roles of information technology in agricultural extension service. He introduced current

information technology on agricultural extension service in U. S. In addition, he made me

to learn how to improve extension service using information technology on South Korea

in terms of contribution of my research.

I thank Dr. Tim Momol, who served on my committee. He gave me a big help to

understand concepts about crops and pests. He introduced concepts on plant pathology. It

was basis on developing a crop-pest ontology which is a major component of my

research.









I thank Dr. Joachime Hammer, who served on my committee. He provided

guidance on basic concepts of database management system and semantic web.

I also wish to thank the other members of my academic community at University of

Florida, particularly Dr. Daniel W. Lee, and Mary Hall. I am deeply grateful to my

friends (Mcnair Bostic, Rohit Badal, Yeonchul Jeong, and Chris Davison) for their

support during a difficult period for our field.

Outside my academic community, I shared many tears and laughs over with Angela

Brammer who corrected my grammar on this dissertation. She did her best for helping me

finish it. I thank Inok Kim who takes care of my baby. She cares for my daughter as like

her third child. I know I am very lucky to meet her in Gainesville.

The dissertation was a long process, and much has changed in this time. Bonny

Koo, my daughter, was born during this period, and she is a sweet hope for the future.

Whenever I felt that I was not enough smart to write a dissertation, she made me to keep

trying to do. I really thank my husband, Jawoo Koo. He is my best friend, my friend, and

my partner. My mother-in-low, Kyungsun Whang, and my father-in-low, Chiwhe Koo,

encouraged me to do my Ph.D. degree and gave me lots of help to finish this dissertation.

Finally, I thank my father, Jongko Kim, and my mother, Bokrae Kim. They put all

efforts to take care of me through their lives. Unless they did not, I could not become a

Ph.D. I thank my brothers, Jaeho Kim and Kwangho Kim as well. I really thank God who

makes me to be happy every moment.

















TABLE OF CONTENTS



ACKNOW LEDGM ENTS ........................................ iv

LIST OF TABLES ........................... ...... .......... ............... ix

LIST OF FIGURES ................................................. ...............x

ABSTRACT................................. .............. xii

CHAPTER

1 INTRODUCTION ................... ...................................... ......... .......

Statement of Problems ............... ........................ .. ............. ....... ...............
Limitation of Finding Relevant Images..........................................................2
Limitation of Helping Users to Find Proper Keywords ............... ......... ...4
Im age R etrieval U sing a Thesaurus.........................................................................5
An Approach to Image Retrieval Using an Ontology ..........................................6
Related W works in the Agriculture Field................... ........ ................................9
Contributions ........................................ .........10
Overview of Chapters ................... ............................ ......... .. ..... ................. 10

2 THE CROP-PEST ONTOLOGY .................................................................... 12

Introduction...................................... .................. .............. .........12
Terminology ........................................ ........12
OW L.................. ................... .................. ......... ......... 13
Component of the Crop-Pest Ontology .............................................................13
Methodology for Building the Crop-Pest Ontology ...........................................14
Purpose and Domain of the Crop-Pest Ontology ..............................................16
Consideration of Reuse of Existing Ontologies ...............................................17
Enumeration of Important Terms in the Crop-Pest Ontology .............................18
Building Classes and the Class Hierarchy.........................................................19
Defining Properties...................................... ........21
Creating Individuals ................................................ .........22
The Crop-Pest Ontology ................................................................. ........24
Validation of the Crop-Pest Ontology .................................................................24
Evaluation of the Crop-Pest Ontology .................................................................24
Conclusion and Discussion...................................... ........27









3 A PRACTICAL COMPARISON BETWEEN THESAURUS AND ONTOLOGY
TECHNIQUES AS A BASIS FOR SEARCH IMPROVEMENT.............................29

Intro du action .................. ........ .. ............................................ 2 9
The National Agricultural Library Thesaurus .................................. .....30
R relationships betw een Term s........................................................ ..............31
Comparative Analysis............... ................................. 32
Representing Domain Knowledge.................................33
Concepts, Semantic Relationships, and Their Logical Consistency ............33
The Ability to Represent Complicated Concepts.......................................36
Reasoning Based on Representation ..................................... ......... ......39
R easoning Facilities ............................... ........... .............. 39
Searching D ocum ents............................ ........................... ....... ........41
Automatic Validation of Logical Consistency ...........................................42
Conclusion ...................................... ................................ ........ 44

4 BUILDING A DATABASE AND GRAPHICAL USER INTERFACE FOR
BROWSING IMAGES BASED ON THE CROP-PEST ONTOLOGY....................46

Introduction............... ................... .................. ............... ........ 46
O ntology-B ased Im age Indexing..............................................................................47
Creating Concepts for Indexing Images in the Crop-Pest Ontology ...................48
The Indexing Process ................................. .. .... ........ ...................49
Step 1: Syntactic and Semantic Analysis of Each Image Caption ...............49
Step 2: Creating an Individual of the Image in the Crop-Pest Ontology .....50
Step 3: Filling in the Values of Properties for Each Individual ...................50
Step 4: Connecting Assigned Values into Classes/Individuals in the crop-
pest ontology ...................................... ................... ........... 50
Step 5: Saving the Individual into the Crop-Pest Ontology .........................51
An Interface for Browsing Images ................................. ............... 51
Goals .................. ............... .......................... ........ 51
Features to be Supported ................. ............. ...................52
T he G raphical Interface .................................................................................. 52
Usability Study ....................................... ...............54
The Keyword-Based Search Interface........................................................55
Hypotheses to Be Tested ................. ............. ...................55
Design and Procedure ................. .. ....................................... .. ............ 55
Results .......................................................56
Conclusion ...................................... .................................. ....... 57

5 ONTOLOGY-ASSISTED INFORMATION EXTRACTION............... ...............67

Introduction ............................. ................... ......................... 67
Components in Information Extraction System .................. ................ ...68
Ontology ...................................... ........ ........... .69
Phrase Patterns..................... ..................... 69
Island Chart Parser ............................................. .. ...... 71










Semantic Structure ................ .. ...... .. ...............72
Results of the Information Extraction System........... ............... ...............73
Conclusion............................ ................................. .......75

6 CONCLUSION AND FUTURE DIRECTIONS....................................76

APPENDIX

A 29 1 IM A G E C A P T IO N S ....................................................................................... 80

B A LIST OF WORDS APPEARING ON 291 IMAGE CAPTIONS...........................87

C 138 TERMS USING THE EVALUATION OF THE CROP PEST ONTOLOGY
AND THE AGROVOC ONTOLOGY ........................... ......... 90

D THE CROP-PEST ONTOLOGY WRITTEN IN OWL...........................................92

E QUESTIONAIRE ON EVALUATION .............................. ...............145

REFERENCES .......................................... ... .. .. ......... 148

BIOGRAPHICAL SKETCH .................... ........ ........ ........153
















LIST OF TABLES


Table page

1-1 Mean Precision and Relative Recall of search engines during 2004 .........................3

-2-1 The coverage of 138 tested terms in the crop-pest and AGROVOC ontologies......27

4-1 The result of the question "How many empty results were found when using this
interface?" ...................... .. .......... ............. 60

4-2 Participant's responses about the question "Did this interface help you learn
more about crops, pests, and relationships between them?"........... ..................60
















LIST OF FIGURES


Figure page

-1-1 An illustration of precision and recall. Precision is expressed as the percentage
of retrieved docum ents that are relevant.. ...............................................................2

2-1 Image captions in the image collection ...........................................16

-2-2 Concepts from SUMO that compose the upper level of the crop-pest ontology. ....18

-2-3 The ten most frequently appearing terms in the 291 image captions.....................19

-2-4 Class hierarchy from the root class "thing" down to the class "insect" and its
subclass "southern green stink bug," showing top-level, middle-level and
bottom-level classes. .... .................. ........ ................ 20

-2-5 The object property "has_developmental_stage_of' and the datatype property
"number of legs" in the class "insect."................................ ..........................22

-2-6 An individual of class "soybean." All properties are filled with each value.........23

-2-7 Results of the consistency check of the crop-pest ontology using the Pellet OWL
reasoner. ................... ...................................... ............... ........ 24

3-1 Description of a term plants in NALT ("..." denotes additional terms not shown) 34

-3-2 Description of a concept Plant from the crop-pest ontology, written in Web
Ontology Language (OWL) .............................................. ............... 35

-3-3 Three terms, corn earworm, peanut and larva, in the NALT thesaurus................. 37

3-4 Properties assigned a class corn earworm for describing the concept "large corn
earworm larva on peanut leaf' ............. ...... .................. 38

-3-5 The OWL abstract form for the individual of a class "corn earworm" to
represent a complicated concept "large corn earworm larva on peanut leaf'......... 39

-3-6 The OWL abstract form for an inference "corn earworm is a peanut pest"......... 40

3-7 Screen shot of the preferred term "plant pests" in NALT.............................. 42

-3-8 Screen shots of OWL consistency checker and the results in Pellet.....................43









4-1 Hierarchy from the root class "thing" to "digital photograph." ............... ..........60

4-2 Overview of indexing images associated with the crop-pest ontology................ 61

4-3 Syntactical and semantic analysis of the image caption "damage caused by three
cornered alfalfa hopper on soybean."................... ..............61

4-4 The interface that provide a facility to browse 291 images ............................ 62

4-5 Expanding a node (orange rectangle) shows an image. ....................................62

4-6 A facility for showing properties. A property is shown by an edge with a label.
The thick end of each edge represents the domain.......... ......... .............63

4-7 The facility to show the selected images with related concepts in the crop-pest
ontology............... ................... ................... 64

4-8 The facility to show all images related with a particular class in the crop-pest
ontology .............._ .......... ................. ............. 65

4-9 The screenshot of keyword-based search interface implemented with Egothor..... 66

5-1 The information extraction system in the crop-pest domain............... .............. 69

5-2 The hierarchical structure of phrase patterns for a concept "plant part".............. 70

5-3 Components of the island chart parser with a simple input string .......................... 73

5-4 A parse tree (left) and the semantic structure (right) based on the parse tree for
the caption "cotton stainer adult in a white cotton bloom." ................ .............. 74
















Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

USING A CROP-PEST ONTOLOGY TO FACILITATE IMAGE RETRIEVAL

By

Soonho Kim

December 2005

Chair: Howard W. Beck
Major Department: Agricultural and Biological Engineering

Professionals in the agricultural field, such as growers, Extension agents, and

researchers, need a facility to organize and locate photographic images related to their

work, especially as the volume of such images continues to increase. However, current

keyword-based image retrieval suffers from relatively low precision and recall. A new

approach to image retrieval using an ontology in the agricultural field addresses the

limitation of supporting users to find proper images in keyword-based image retrieval, by

browsing images associated with formal descriptions of the meanings of words and the

relationships between them. Two hundred and ninety-one images were used to develop

the approach in the particular domain including crops and related pests. A "crop-pest

ontology" was created to represent concepts describing the images. The ontology

contains crops and related pests, relationships between them, and environmental factors

affecting them. A practical comparison between the crop-pest ontology and the existing

National Agricultural Library Thesaurus (NALT) was done to compare and contrast the

similarities and differences between the thesaurus and an ontology. The comparison









shows that the crop-pest ontology has better formal representation capabilities avoiding

ambiguity as well as supporting inferences which are not possible in a thesaurus such as

NALT. To enable browsing of images associated with the crop-pest ontology, images

were indexed based on the ontology. The indexing process included manual syntactic and

semantic analyses of each image caption, but such an analysis has a high labor cost.

Therefore, a process of semi-automatic analysis was designed using natural language-

based information extraction techniques which include a parser, a grammar described by

phrase patterns, and the crop-pest ontology. A graphical interface was implemented for

browsing images associated with concepts in the crop-pest ontology. A usability study

indicates that participants met less empty results in the retrieval of images using the crop-

pest ontology. Moreover, it shows that the image retrieval using the crop-pest ontology

helps users to find relevant images by transferring the domain knowledge to them. The

indexing process included manual syntactic and semantic analyses of each image caption,

but such an analysis has a high labor cost. Therefore, a process of semi-automatic

analysis was designed using natural language-based information extraction techniques

which include a parser, a grammar described by phrase patterns, and the crop-pest

ontology. This research 1) shows the development of the crop-pest ontology, 2) analyzes

the differences and similarity between an ontology and a thesaurus, 3) develops a method

of automatic information extraction were explored as a way to reduce the manual labor

required for ontology-based indexing and 4) develops a new approach using the ontology

to index and browse images so that professionals can retrieve images more easily and

accurately in the agricultural field.














CHAPTER 1
INTRODUCTION

Images are a major component of agricultural information systems. As the number

of available images increases rapidly, finding relevant images in a timely and efficient

manner becomes more difficult. Agricultural professionals such as growers, Extension

agents, and researchers need a facility to retrieve images in a collection more easily in the

agricultural domain. There are two standard approaches to image retrieval: content-based

and text-based. In content-based image retrieval, images are searched using features such

as color, texture, shape and spatial location. An example of this approach is the PicSOM

system (Koskela, 2000). In text-based retrieval, searches are based on textual descriptions

such as image captions. Since content-based retrieval is still not suitable for most

applications, online image-retrieval engines such as Google (Google, 2005) employ text-

based image retrieval.

Statement of Problems

A typical method of text-based image retrieval employs the use of keywords.

Keyword-based image retrieval is an approach that retrieves text such as image captions

or descriptions of images by using indexes of words appearing in the text. In its simplest

form, a search engine indexes every word occurring in every piece of text associated with

images in a collection to be searched. Users describe their interests through one or more

keywords. If the keywords appear in indexes, the search engine shows images containing

those keywords. This keyword-based image retrieval approach has two general

limitations (Hyvonen et al., 2003).









* Limitation of finding relevant images: Appearance of a keyword in a text does
not necessarily mean that the text is relevant to the user's interest. Relevant text
may not necessarily contain the explicit keyword typed by the user.

* Limitation of helping users to find proper keywords: the user does not
necessarily know what keyword to type to find images. The keyword-based
approach is not useful unless the user is familiar with what kinds of images are in a
collection and the user know what terms are used to describe relevant images.

Limitation of Finding Relevant Images

The limitation of finding relevant images is formally measured in terms of recall

and precision, illustrated in Figure 1-1. Recall is the ratio of the number of relevant

documents retrieved to the total number of relevant documents in the collection.

Precision is the ratio of the number of relevant documents retrieved to the total number of

documents retrieved. Recall and precision are usually expressed as percentages.


The set of relevant documents
in the collection


The set of documents retrieved


A: relevant documents not retrieved
B: relevant documents retrieved
C: irrelevant documents retrieved

Precision = (B/(B+C))*100 Recall = (B /(A+B))*100


Figure 1-1. An illustration of precision and recall. Precision is expressed as the
percentage of retrieved documents that are relevant. Recall is expressed as the
percentage of relevant documents retrieved out of all the relevant documents
in the collection.









The most desirable retrieval approach would be one with high precision and recall.
Such a retrieval approach would find all, and only, the images that are relevant to a user's
interest. Blair reported that a practical evaluation of a publication system containing
roughly 350,000 pages of text showed the average recall value of the retrieval system to
be less than 20 percent of the text relevant to a particular retrieval (Blair et al., 1985). The
precision values ranged from 19.6 percent to 100 percent. Blair stated that the recall
values were low because keyword-based retrieval is difficult to use when pieces of text
are retrieved by subject. Shafi reported that the precision and recall of three search
engines, AltaVista (AltaVista, 2005), Google (Google, 2005) and HotBot (HotBot, 2005),
were less than 30 %, as shown in the Table 1-1. The search engines were evaluated taking
the first ten results pertaining to scholarly information for estimation of precision and
recall.
Table 1-1. Mean Precision and Relative Recall of search engines during 2004
AltaVista Google HotBot

Precision 27% 29% 28%

Recall 18% 20% 29%

The design of this approach is based on the false assumption that it is a simple

matter for users to predict the exact words and phrases that appear in the texts they would

find most useful (and only in those texts). This assumption comes from the basic but

flawed idea that one can use the "statistical aspects" of words such as the occurrence,

location, and frequency of words to predict their meanings comprehensively. Therefore,

one way of getting higher values of precision and recall would be to take the meanings of

words into consideration. Understanding the characteristics of words themselves can help

the underlying retrieval method adapt to the meanings of words:

* Words can have several meanings. For example, the word "beetle" can mean an
insect belonging to a large order characterized by a modified outer pair of wings
that forms a hard covering for the inner pair (Encarta, 2005). The word "beetle" can
also refer to a car manufactured by Volkswagen that has a shape reminiscent of the
insect.









* Different words can have the same meaning. For example, the word "worm" can
mean an elongated soft-bodied insect. The word "larval" can mean a wingless and
elongated soft-bodied insect that is immature hatching from egg. The different
words, "worm" and "larva" point at the same meaning.

* Words can have a wide variety of different associations. For example, the word
"beetle" has an association with the word "soybean" because a beetle is a pest of
soybeans. In addition, the word "beetle" is associated with the words "egg,"
"larva," "pupa," and "adult," which are all names of developmental stages in a
beetle's life.

Limitation of Helping Users to Find Proper Keywords

The limitation of helping users to find proper keywords can be addressed by

providing a facility of browsing images associated with well-structured knowledge in a

particular field. Yee reported that current keyword-based retrieval such as Google Image

Search (Google Image Search, 2005) did not allow users to browse images (Yee, 2003).

Markkula reported that professionals in artistic fields such as journalism, design, and art

direction use browsing as a basic strategy in searching for images. The reason is that

some words describing selected images may be difficult to express freely as search

keywords but are easily applied when the images are seen. Similarly, growers and other

professionals in the agricultural field would want to be able to browse images as well as

search for them with keywords.

A facility to show relationships of an image collection to users can address the

limitation of supporting users in the keyword-based image retrieval with providing

describing a particular knowledge inferred by images. The result set generated from the

keyword-based retrieval would perhaps miss interesting aspect of an image collection;

the images in the collection are related to each other in many relationships. For example,

an image presenting "stink bug damage on cotton leaf" can be retrieved as a result set of

the keyword "stink bug". However, the interesting relationships between "cotton leaf"









and "damage" can not be shown to the users, even though these relationships might give

a clue for users to find relevant images.

Image Retrieval Using a Thesaurus

A thesaurus is a list of terms related to a particular subject and describes related

terms for each item. Its primary purposes are indexing documents and helping users

retrieve information more easily. It is organized in a hierarchical structure, based on

interrelationships of the terms:

Broader Term (BT): A particular term is more general than another term.

Narrower Term (NT): A particular term is more specific than another.

Related Term (RT): Two terms are associated.

Use For (UF): A particular term is the preferred term among a set of synonymous

terms.

One of main contributions of a thesaurus in image retrieval can be reformulation of

users' requests and expansion of them to address low precision and recall in the keyword-

based image retrieval. A thesaurus could be used to retrieve more relevant images by

expanding user's requests with related terms (RT), which might result in increasing

recall. In addition, a thesaurus could be used to avoid retrieving non-relevant information

by using narrower term (NT). The approach of retrieving images using a thesaurus has

been developed in order to address the low precision and recall of keyword-based image

retrieval by considering the interrelationships between terms. Dalmau reported that the

integration of thesaurus relationships into search and browse in an online photograph

collection significantly improved the user's discovery experience (Dalmau et al., 2005).

When users' requests were found in the thesaurus, the result page provided search

suggestions based on broader term (BT) or narrower term (NT) so that users can broaden









or refine a result set. In addition, the search performed retrieval of all narrower terms of

user's requests if they are matched into the thesaurus (Dalmau, 2005). However, Hersh

assessed the expansion of users' requests using thesaurus relationships for improving

search performance (Hersh et al., 2000). A test collection was expanded using synonym,

BT, NT, and RT in the Unified Medical Language System (UMLS) Metathesaurus

(UMLS Metathesaurus, 2005). Hersh reported that thesaurus-based query expansion

causes a decline in retrieval performance generally.

A thesaurus can be used as a tool to browse images. Dalmau insisted that a

thesaurus directed users to more access points available for each image. In addition, he

argued that the use of a thesaurus to browse images provides disambiguation. However,

Chun stressed that a thesaurus has a limited number of relations, which can result in

relatively meager expressiveness of representing specific knowledge, which could result

in ambiguity of relations (Chun 2004).

An Approach to Image Retrieval Using an Ontology

A new approach to image retrieval using an ontology is introduced to deal with the

two limitation of 1) finding relevant images and 2) helping users to find proper keywords

in the keyword-based image retrieval. The word ontology is originally from the field of

philosophy and referred to the subject of existence. 1 Computer scientists have eventually

come to use this term to support the sharing and reuse of formally represented knowledge

in computer systems (Gruber 1993). An ontology is defined as a collection of concepts

and their relationships which describes knowledge in a particular domain. An ontology is

described in a formal way that makes concepts understandable to a machine. A concept is


1 hItp \ %\ \ -ksl.stanford.edu/kst/what-is-an-ontology.html









a set of things that we receive in the world. A concept has a set of property that must be

true of each member of the set denoted by the concept A concept can be represented by

one of three formal elements in an ontology: an individual, class, or property:

* Individual: An individual is defined as a real object in the world.

* Classes and subclasses: A class defines a set of enumerated individuals that
belong together according to their common properties. Any class can be a subclass
of another class that is, whenever satisfying the necessary and sufficient conditions
of another. The subclasses are satisfying the requirements of their superclass and
adding additional restrictions. Superclasses are generalization of the common
properties of the subclasses.

* Properties: Properties are defined as relationships between individuals or between
individuals and data values (such as strings and integers).

* Domain: A domain of a property is defined as a set of individuals to which the
property is applied.

* Range: A range of a property is defined as a set of individuals that the property has
as its value.

Concepts in the crop-pest domain, which include crops and related pests, can be

described in a crop-pest ontology. This ontology includes concepts such as plants, pests,

relationships between plants and insects such as damage, and environmental concepts

such as soil. The concept "plant" is described using the class "plant." The particular

concept "damages" can be represented as a property between "insect" and "plant."

Classes, properties, and individuals are described formal machine-readable form (what

does that mean).

The word ontology has also become popularly associated with one idea for the next

generation of the Web, called the Semantic Web (Semantic Web, 2005). The Semantic

Web purports to be a universal medium for information exchange by supplying meaning

in such a way as to be machine-processable to the content of documents on the Web.

Currently, the Web is based on documents written in HTML, a language that describes a









body of structured text, focusing on a desired visual layout. However, HTML has the

limitation that it does not describe information contained within the documents

themselves. For example, with HTML we can present a page that lists pesticides;

accordingly, the HTML code of this page can make simple, document-level statements

such as "This document's title is 'Pesticides.'" But there is no facility within HTML itself

to relay more complex concepts, such as "AZOXYSTROBIN" is a pesticide with a unit

cost of $1.38." Rather, HTML can only say that the text of "AZOXYSTROBIN" is

something that should be positioned near the words "Pesticides" and "$1.38." HTML can

not indicate that AZOXYSTROBIN is a type of pesticide or even assert that $1.38 is a

unit price. The Semantic Web addresses this limited ability of HTML found within the

current Web by using ontologies and by extension of current Web markup languages, all

of which will play key roles in describing richer semantics of Web documents by

providing sources of shared, precisely defined terms.

The rich semantics of ontologies can provide better retrieval and indexing of

images. TextPresso2 is a biological publication system that uses ontologies to catalog and

retrieve literature. In the TextPresso system, using an ontology resulted in a threefold

increase of search efficiency in the specific field of biological gene-to-gene interaction

(Muller 2004). Hyvonen developed an ontology-based image retrieval system for 600

photographs in the Helsinki University Museum (Hyvonen, 2003). The promotion

ontology was used to annotate images and provided a facility to image retrieval. He

stressed that image retrieval using the promotion ontology helped the user to find relevant

images, even though the user initially lacked knowledge about the domain


2 1blp "\\ \ .textpresso.org










Related Works in the Agriculture Field

Existing image repositories in agriculture employ either the keyword-based image

retrieval or a browsing tool using a few levels of categories, or both. Plant Diagnostic

Information System (PDIS, 2005) and Digital Diagnostic and Information System (DDIS,

2005) provides an image search based on keyword-based image search. Therefore, the

image search in PDIS and DDIS could have low recall and precision. In both systems,

each image has a searchable text that describes image annotations, circumstances and

other relevant information. Unless users are familiar with the text, it could be difficult to

find proper keywords to retrieve images. Agricultural Research Service provides an

image gallery contains more than 2000 images (ARS, 2005). It provides a keyword-based

image search. Therefore it has the same limitations of conventional keyword-based

search. In addition, it provides nine simple categories to help users to find relevant

images: animals, crops, education, field research, fruits & veggies, illustrations, insects,

lab research, and plants. Each category shows all images that are classified into the

category and there is no subcategory. The nine categories are too general to describe

whole contents of images. That means users can miss some contents of images that might

be important to them. Insect Images (Insect Images, 2005) provides keyword-image

retrieval for insect images and support categories to browse images associated with the

insect scientific names. However, Insect Images still has a problem with low recall and

precision, since it employs the conventional keyword-based search.

Other image repositories in agriculture provide only browsing facilities images

associated with categories. Texas Agricultural Extension Service provides browsing for

cotton insect images using a few levels of categories. Those categories are not enough to









represent the content of images. User need a tool to represent contents of images more

precisely.

Contributions

The main contributions of this dissertation are summarized as follows:

* A new technology ("ontology") -was adapted to an agricultural information
system to address the image retrieval problem.

* An ontology describing crops and related pests, called the crop-pest ontology, was
built in the formal Web Ontology Language, OWL. A practical methodology for
building the ontology was developed.

* Based on the crop-pest ontology, manual extraction of image information was done
with image captions from a scientist who is working on crops and insects as a first
step toward image indexing.

* Based on the previous results, images associated with the crop-pest ontology were
indexed to enable browsing of the images.

* A new graphical interface was created for browsing images indexed with the crop-
pest ontology.

* Methods of automatic information extraction were explored as a way to reduce the
manual labor required for ontology-based indexing.

Overview of Chapters

Chapter 2 presents the development of the crop-pest ontology, which covers crops,

pests, the relations between them, and the environmental factors surrounding them. Based

on early methodologies of ontology building, a practical methodology is introduced for

the agricultural field. In addition, validation and evaluation of the created ontology are

discussed.

Chapter 3 presents the differences between the crop-pest ontology and another

similar approach: "thesaurus." Jacob introduced the argument that a controlled

vocabulary is itself an ontology, so long as the standard concept of a controlled

vocabulary is similarly redefined (Jacob, 2004). This chapter discusses a comparative









analysis between the crop-pest ontology and the well-known agricultural thesaurus

National Agricultural Library Thesaurus (NALT) in order to verify Jacob's argument.

The analysis was done according to the representational and inferential abilities that lend

more power to information retrieval. The result of this comparative analysis is reported.

In addition, the result is discussed in terms of addressing the limitations of keyword-

based image retrieval

Chapter 4 presents the process of indexing each image with the crop-pest ontology.

This process was based on syntactic and semantic analysis. A graphical interface for

browsing images with the crop-pest ontology is introduced. The preliminary evaluation is

represented.

Chapter 5 presents the process of information extraction, which aided in the

indexing of 150 images that were associated with the crop-pest ontology. Information

extraction is a process that identifies useful information from natural language text

regarding a domain and converts that information to a structured form, which can be

saved into a database. The process uses a parser to map words appearing in each image

caption to concepts in the crop-pest ontology. This mapping process helps build indexes

of images.

Chapter 6 summarizes conclusions and identifies future directions.














CHAPTER 2
THE CROP-PEST ONTOLOGY

Introduction

An ontology represents domain knowledge using concepts and relationships

expressed in a formal, machine-processable language. Building a domain-specific

ontology is a process of capturing domain knowledge using this formal language. First

defining the purpose and intended uses of an ontology is a crucial step. The crop-pest

ontology was built to facilitate image retrieval in an image collection taken by a scientist

who is working on crops and pests in the University of Florida. The collection contains

291 images that shows three crops (soybean, peanut, and cotton) and related insects that

cause damage on them. The scope of the crop-pest ontology covers at least the domain

knowledge contained by the image collection.

This chapter will introduce a methodology for developing the crop-pest ontology

and describe it using specific examples in each step. In addition, the created crop-pest

ontology will be shown. Then the validation and evaluation of the ontology will be

described.

Terminology

Before the procedure of building the ontology can be discussed, components of the

crop-pest ontology and some terminology used during the development of the crop-pest

ontology must be introduced.









OWL

The OWL is an acronym for Web Ontology Language, a semantic markup

language for publishing and sharing data using ontologies on the web. OWL has three

sublanguages: OWL Lite, OWL DL, and OWL Full. OWL provides machine-processable

information on the Web. OWL provides the formality of the crop-pest ontology.

Component of the Crop-Pest Ontology

The crop-pest ontology consists of classes, subclasses, properties, subproperties,

domains, ranges and individuals in a hierarchical structure.

* Individual: An individual is defined as things we perceive in the world. For
example, when one sees a green plant in a field, the specific green plant observed
can be assigned as an individual.

* Classes and subclasses: A class defines a set of enumerated individuals that
belong together according to their common properties. For example, a class
organism can be defined as any individual that has six common properties:
movement, feeding, respiration, growth, reproduction and sensitivity to stimuli.
Here, these six specific properties are called necessary conditions to be a member
of an organism, which keeps this ontology logically consistent. In other words, a
virus is not an organism, because a virus cannot reproduce itself without a host.
Classes are organized in a hierarchical structure using subclasses. Any class can be
a subclass of another class that, whenever satisfying the necessary conditions of
another. For example, a class Plant could be defined as a subclass of the class
Organism. From this statement, we can deduce that if an individual is a plant, then
it is also an organism. The subclasses are satisfying the requirements of their
superclass and adding additional restrictions. Superclasses are generalization of the
common properties of the subclasses. All classes in the crop-pest ontology are the
subclass of Thing, considered the singular root of the crop-pest ontology itself.

* Properties and subproperties: Properties are defined as relationships between
individuals or between individuals and data values (such as strings and integers).
For example, properties of the class Organism can include has parts, locate in,
cause damage to, has color, and has age. Properties are divided into two categories:
one is a property related to a member of a certain class, and the other is a property
related to a data type. For example, the property cause damage to is related to an
organism of the class Organism that can be damaged. Likewise, the property has
age is related to a integer value such as ten. As with the overall hierarchy of classes,
a property can be a subproperty of one or more other properties. For example, the
property cause damage to can have a subproperty cause feeding damage to. We can
likewise conclude that if a member of a class is related to another by the property









cause feeding damage to, then it is also related to the other by the property causing
damage to. The individuals and data types participating in a property can be
restricted, using domain and range.

* Domain: A domain of a property is defined as a set of individuals to which the
property is applied. For example, let us assume the property cause damage to
covers the domain of pest. Thus, if A can cause damage to B, then A must be an
pest.

* Range: A range of a property is defined as a set of individuals that the property has
as its value. For example, the property cause damage to may be assigned the range
of plant. Based on deduction, we can reach the conclusion that if A causes damage
to B, then B must be a plant.

Methodology for Building the Crop-Pest Ontology

Noy pointed out some fundamental rules in ontology development. First, the best

method involves focusing on the intended application. Second, to build an ontology, it is

best to redefine the ontology by using it in applications and by discussing it with experts

in the field, after defining an initial version of an ontology. Third, concepts in the

ontology are physically or logically close to real objects, such as physical objects or

logical objects and their relationships. For example, nouns are likely to be objects and

verbs are likely to be relationships in sentences that describe domain knowledge in a

domain (Noy and McGuinness, 2001). Kalyanpur suggested an outline of developing an

ontology using a casual Web ontology development process (Kalyanpur et al., 2004). He

emphasized the following process:

* Ontology developers start with certain domain information they want to model, and
based on that information, they derive a loose terminology of concepts and
relationships in the domain.

* The concepts are structured into a hierarchy and associated with their properties.

* The ontology is refined by browsing and searching concepts.

Uschold proposed a skeletal methodology for building ontologies in more detail

(Uschold and Gruninger, 1996). His methodology was as follows:









* Identification of purpose and scope.

* Ontology capture: finding the key concepts and relationships in the domain;
description of precise unambiguous text definitions for such concepts and
relationships; and identification of terms to refer to? such concepts and
relationships.

* Ontology coding: explicit representation of the the ontology in some formal
language.

* Integration of existing ontologies.

* Evaluation of the ontology.

* Documentation of the ontology.

Noy outlined a guide to create an ontology as well (Noy and McGuinness, 2001).

Her guidelines were similar to Uschold's, but they focused on more practical aspects,

using the example of "wine and food." Her simple guidelines for developing an ontology

were the following

* Define classes in the ontology.

* Arrange the classes in a taxonomic hierarchy.

* Define properties and describe allowed values for these slots.

* Fill in the values for properties for individuals.

In general, the crop-pest ontology was built according to Noy's methods. The

methodology for developing the crop-pest ontology is as following:

1. Development of the purpose and domain of the crop-pest ontology

2. Consideration of Reuse of Existing Ontologies

3. Enumeration of Important Terms in the Crop-Pest Ontology

4. Building Classes and the Class Hierarchy

5. Definition of the Properties of Classes

6. Creating Individuals









Purpose and Domain of the Crop-Pest Ontology

The starting step of the development of the crop-pest ontology was to define its

purpose and domain. The purpose of the crop-pest ontology is to use a tool to browse and

search 291 images associated with their captions. To fulfill this purpose, the crop-pest

ontology needed to cover all concepts in the image captions. Figure 2-1 shows some of

those captions.

Thrips damage to peanut leaves.
Closeup of adult thrips on peanut leaf.
Rednecked peanutworm and damage on peanut.
Rednecked peanutorm in peanut bud.
Hopperburn caused by leafhoppers on peanut leaves.
Closeup of hopperburn caused by leafhoppers on peanut leaf.
Overview of hopperburn on peanut caused by leafhoppers.
Adventitious root growth on peanut caused by three-cornered alfalfa girdling.
Lesser cornstalk borer silken feeding tubes on peanut pegs.
Lesser cornstalk borer adult moths (male left, female right).
Closeup of lesser cornstalk borer larva on peanut leaf.
Whitefringed beetle grub in soil at base of peanut plant.
Spotted cucumber beetle (Southern Corn Rootworm adult) on peanut leaf.
Southern corn rootworm (Spotted cucumber beetle larva) on peanut peg.
Sugarcane beetle on finger.
Cutworm on soil curled in C-shape
Figure 2-1. Image captions in the image collection

As shown in Figure 2-1, the content of the image captions includes insects, plants,

relationships between them such as damage, and environmental elements such as soil.

The domain that this ontology covers is composed of crops, pests, and the relationships

between them, as well as the environmental elements surrounding them. Crops (such as

soybeans, peanuts, and cotton) are defined as a collection of plants grown by farmers for

food or other uses. These crops also offer food or shelter for various developmental

stages of insects. The term "pest" refers to any insect that damages a crop by introducing

disease or physical and physiological changes. The crop-pest ontology, therefore,

supports both the external structure of the crops and insects as well as the internal









processes or events resulting from associations between them. In addition, it reflects the

nature of crop-pest relationships and the environmental elements surrounding crops and

pests. In addition, the ontology includes concepts that are not directly related to crop and

pests. For example, in Figure 2-1, the caption "Sugarcane beetle on finger" contains the

concept "finger." The purpose of the ontology is to support image retrieval. When a user

searches for images, often a scale of size is required for comparison, and this example a

human body part such as a finger is used for scale.

Consideration of Reuse of Existing Ontologies

One advantage to using ontologies is the possibility of reusing existing ontologies when it

is possible to refine and extend existing ontologies built in the same domain for a

particular purpose. In the agricultural field, ontologies suitable for the purpose and the

crop-pest domain do not yet exist. There is a National Agricultural Library Thesaurus

(NALT) called "NALT" that covers agricultural fields including the crop-pest domain.

The NALT can be a reference to build the crop-pest ontology. However, the ambiguity of

relationships on NALT did not directly allow reusing it to build the crop-pest ontology.

Some general concepts such as "abstract" and "physical" in the upper level of the

crop-pest ontology were imported from the Suggested Upper Merged Ontology (SUMO).

SUMO is an upper-level ontology that provides definitions for general-purpose concepts

and acts as a foundation for more specific domain ontologies (Niles and Pease, 2001).

Figure 2-2 shows some concepts from SUMO that comprise the upper level of the crop-

pest ontology. Most domain-specific concepts, such as "stink bug" and "peanut," were

created.
































Figure 2-2. Concepts from SUMO that compose the upper level of the crop-pest
ontology.

Enumeration of Important Terms in the Crop-Pest Ontology

It is useful to list important terms on the domain, because it helps the ontology

developers to group terms manually. Image captions are a good source of terms because

those captions are directly related to the crop-pest domain and of course they are

designed specifically to describe the content of the image. 291 image captions were

tokenized and counted on the frequency of each term. 257 terms were listed according to

the frequency with which they appeared in the image captions. Ten of the most frequent

terms are shown in Figure 2-3. The others are shown in Appendix B. However, these 257

terms are not all of the terms in the domain. Terms not appearing on this list but needed

for this domain were added during the development of the ontology. For example, a term

"insect" was not contained in the list. However, the term "insect" was needed to


Abstract
- attribute
o internal attribute
o relational attribute
quantity
o content-based quantity
o physical quantity

physical
- object
o agent
o collection
o self connected object
-process
o biological process
o pathological process









categorize concepts such as "stink bug", "beetle", and "armyworm", so the term "insect"

was included into the crop-pest ontology.

On 217(a)
Cotton 124
Soybean 82
Leaf 69
Photograph 68
Larva 59
Damage 57
Peanut 47
Of 43
Boll 31
Figure 2-3. The ten most frequently appearing terms in the 291 image captions. Others
are shown in Appendix B. (a) indicates the frequency of the word "on" in the
291 image captions.

Building Classes and the Class Hierarchy

Noy introduced these three approaches in developing a class hierarchy (Noy and

McGuinness, 2001):

* A top-down development process, which begins with the definition of the most
general concepts in the domain and proceeds to subsequent specification of the
concepts.

* A bottom-up development process, which begins with the definition of the most
specific concepts and continues with the subsequent grouping of these concepts
into more general concepts.

* A combination development process, which starts with a few notable top-level
concepts and a few salient specific concepts.

The crop-pest ontology was developed based on the combination approach. First,

the developing process started with a few notable top-level classes. The root of the crop-

pest ontology became the class "thing," which is a standard rule of building any ontology,

according to the standard Web Ontology Language (OWL, 2005). This class "thing" is

the most general concept in the ontology. All other classes are subclasses of the class

"thing". Classes imported from SUMO were used as the top-level classes immediately










below the class "thing" (See the figure 2-2). The reason for importing these SUMO

classes as top-level classes is these top-level classes from SUMO provided a foundation

for more specialized classes. For example, a specific concept "number" was a subclass of

class "thing" according to the OWL standard. Then, SUMO provided two subclasses of

class "thing": One is a class "abstract" and the other is a class "physical". The two classes

are disjointed, which means there is no individual of class "abstract" that become an

individual of class "physical". The class "physical" represents a thing has a location in

space and time. Since the concept "number" is not located in space and time, the concept

can be assigned as a subclass of the class "abstract". The process of determining which

classes belongs to the class "number" was continued through top-level classes to find

correct location of the class.

The middle-level classes in the crop-pest ontology are plants, related insects, and

environmental objects such as soil. For example, the class "insect" became a middle-level

class in the ontology. All upper subclasses of the class "insect" are shown in Figure 2-4.


*owl:Thing
*physical
*obie ct
self connected object
/ l corpuscularr object
\ organic object Middle level bottom level
*organism
Top leve *animal
*invertebrate
*arthropod
*insen
*stink buq
osouthem green stink bug
Figure 2-4. Class hierarchy from the root class "thing" down to the class "insect" and its
subclass "southern green stink bug," showing top-level, middle-level and
bottom-level classes. Classes in blue were imported from SUMO and classes
in purple were created to describe the bottom level classes.


classes.









The most specific classes were created using the bottom-up approach to the middle

classes. For example, one kind of insect, a concept "the southern green stink bug" (Figure

2-4), appearing on an image caption "southern green stink bug on cotton leaf' was

assigned to a subclass of the "insect" class. This bottom-up approach makes assigning

subclasses easier because concepts appearing on image captions are clear enough to find

which classes belongs to.

Defining Properties

Building a hierarchical structure of classes is not enough to create an ontology

because the hierarchical structure itself cannot fully describe any concept. The complete

description is accomplished by assigning properties. Properties can be one of two types:

object properties or datatype properties, as shown earlier. An object property describes

the relationship between two individuals. For example, the property

"has_developemtal_stage_of' can show the relationship between an individual of the

class "insect" and an individual of one of classes "adult", "pupa", "larva", or "egg", since

an insect has a developmental stage. In this property, the class "insect" becomes a domain

and one of classes "adult", "pupa", "larva", or "egg" becomes a range. A datatype

property describes the relationship between an individual and a data type, such as string

or integer. For example, the datatype property "thenumberof legs" describes the fact

that an insect has six legs by assigning a value of "6" as an integer.










Property: the_number_of_legs Property: has_developmental_stage_of
Types Types
owlDatatypeP property owiiObjetdPropety

Domain Domain
insect inse

Range Range
"6"'^htt p://www.w3.org/2001/MLSche a# t adult OR OR larva OR n u


Figure 2-5. The object property "has_developmental_stage_of' and the datatype property
"number of legs" in the class "insect."

One of the most difficult decisions to make while developing the crop-pest

ontology was determining the lowest level of granularity in the representation in the

ontology. Noy gave a guideline for this decision that depends on the potential

applications of a particular ontology (Noy and McGuinness, 2001). In other words, the

level of granularity is determined by what the most specific concepts are that will be

represented in the ontology for a given application. During the development of the crop-

pest ontology, the decision of the level of granularity was based on the application-to

browse 291 images. Therefore, the crop-pest ontology needed to contain at least all the

concepts shown in the image captions. The most specific classes in the ontology represent

concepts appearing in the image captions.

Creating Individuals

The last step of building the crop-pest ontology was creating individuals of classes

in the hierarchy. Defining an individual of a class involves the following steps:

* Choosing a class
* Creating an individual of the class
* Filling out properties of the individual










For example, the individual "soybean_20" was created to represent a specific soybean.

The class had the following properties defined:

* "locate_place" (Boolean)
* "locate in time" (Boolean)
* "is host of"
* "has_parts"
* "is_damaged by"

All properties are filled with each value, as shown in Figure 2-6. The datatype

property "locateplace" is filled with a true boolean value, and the object property

"is_damaged by" is filled with the individual "yellowstripe_armyworm_16."



Individual: soybean_20


Types


soybean




Relationships


rdf:type

locate place
locate in time

is host of

has parts

has parts

has parts

has parts

has parts

is damaged by


soybean
true
true

yellow striped armyworm 16

pod 4
bloom 6

bud 7

leaf 8

stem 9
yellow striped armvworm 16


Figure 2-6. An individual of class "soybean." All properties are filled with each value.


lue.









Based on the above methodology, 615 classes and individuals were created in the

crop-pest domain.

The Crop-Pest Ontology

The crop-pest ontology contains 286 classes, 81 object properties, 36 datatype

properties, and 305 individuals. The crop-pest ontology is written in OWL, shown in

Appendix D.

Validation of the Crop-Pest Ontology

Validating the crop-pest ontology meant detecting unsatisfiable concepts in

conjunction with an OWL reasoner and reporting errors. Unsatisfiable concepts are

concepts that cannot be true of any possible individual. Those concepts are usually the

result of a basic logic error during ontology development, as they cannot be used to

describe any individual. Unsatisfiable concepts are also easy for a reasoner to detect and

display (Parsia et al., 2005). Pellet is an open-source Java-based OWL DL reasoner

(Pellet OWL reasoner, 2005). Pellet allows utilities to see versions of OWL such as OWL

full or DL to check ontology consistency, to classify the taxonomy, and so on. This Pellet

OWL reasoner was used to check the consistency of the crop-pest ontology. Figure 2-7

shows the result of the consistency check of the crop-pest ontology.

Results
Input file: Text area
OWL Species: DL
DL Expressivity: ALCHIO(D)
Consistent: Yes
Figure 2-7. Results of the consistency check of the crop-pest ontology using the Pellet
OWL reasoner.

Evaluation of the Crop-Pest Ontology

Complete ontologies not only can support their intended applications and function

properly but also can be re-used for the development of other ontologies. Therefore, the









evaluation of ontologies is essential. There are two approaches to ontology evaluation:

qualitative and quantitative (Brewster et al., 2004).

The qualitative approach would be taken by an ontology developer with knowledge

in a particular domain. He/she would be asked to evaluate an ontology using the

perspective of the principles. For example, Gomez claimed that the lack of methods for

evaluating ontologies could be an obstacle to their use in several application domains

(Gomez, 1999). He suggested some ideas to evaluate ontologies technically, especially in

the definitions of classes in the ontology. The evaluation of the definitions of classes in

the ontology is a technical evaluation that must be performed during the whole ontology

development step. The purpose of this evaluation is to discover deficiencies of defined

classes, individuals and properties. First, the structure of the ontology should be checked,

using the criteria that the definitions should have clear, necessary, and sufficient

conditions and should be written in formal language. In addition, the definitions should

be logically consistent (Gruber, 1993). Second, the syntax of the definitions should be

checked for syntactically incorrect structure, such as loops between definitions. Third, the

content of the definitions should be checked to detect what the ontology defines, does not

define, or defines incorrectly; what can be, cannot be, or may be inferred; or what may be

inferred incorrectly. Finally, he showed three case studies for the evaluation of an

ontology: the evaluation of definitions, hierarchy, and properties. Kohler reported an

evaluation of existing ontologies in the molecular biological data source (Kohler and

Schulze-Kremer, 2002). He checked the stability of the concepts, the validity of the

hierarchy, and the wide usage technically.









The quantitative approach is a data-driven approach to ontology evaluation that

tests whether the ontology contains domain-specific corpus, which is a collection of

domain-related terms. Brewster chose the art and artists domain for which he had

developed the ARTEQUAKT ontology and then collected 41 arbitrary texts from the

Internet on a numbers of artists. He compared the ARTEQUAKT ontology with the

SUMO ontology and the Ontology of Science (Ontology of Science, 2005), even though

the SUMO ontology and the Ontology of Science did not cover the same domain as

ARTEQUAKT. At the time, he was unable to find any ontology covering the same

domain because ontology development research was then in its early stages. He tested

them how many corpus to determine how appropriate it is for the representation of the

knowledge of the domain represented by the corpus.

The crop-pest ontology evaluation was done using the quantitative approach,

testing the coverage of the ontology with a domain-specific corpus. The first step was to

find an ontology that covered the same domain. The AGROVOC ontology has a high

coverage of the agricultural domain (AGROVOC ontology, 2005), including the crop-

pest domain. The crop-pest ontology was compared with the AGROVOC ontology. The

second step of this evaluation was the collection of arbitrary domain-specific corpus. As

mentioned before, most concepts in the crop-pest ontology were from DDIS image

captions that described crops and related pests. Therefore, similar image captions in the

same domain would have been good candidates for the domain-specific corpus. Two

restrictions were applied to find image captions to test: the captions could not have been

used for the development of the crop-pest ontology, and the test captions were all related

to cotton and its pests. Texas Agricultural Extension Service provided 95 image captions









that contained a domain-specific corpus (Texas Agricultural Extension Service, 2005).

Fifty of those 95 captions were selected randomly, and 138 terms appearing in those 50

captions were used to test the crop-pest ontology, as shown in Appendix C. The coverage

of the 138 terms is shown in Table 2-1. The coverage of the crop-pest ontology was

higher than AGROVOC's coverage. This result showed that the crop-pest ontology was

the closer fit with the selected domain-specific corpus. This indicates how appropriate the

crop-pest ontology is for the representation of the knowledge of the domain represented

by selected texts.

Table 2-1. The coverage of 138 tested terms in the crop-pest and AGROVOC ontologies.
The crop-pest ontology AGROVOC
Coverage (percentage) 44.93% 30.43%

Conclusion and Discussion

The methodology to develop the crop-pest ontology is based on three fundamental

rules: 1 ) the best method involves focusing on the intended application 2) it is necessary

step to redefine the ontology by using it in applications and/or by discussing it with

experts in the field, after defining an initial version of an ontology and 3) concepts in the

ontology are physically or logically close to real objects, such as physical objects or

logical objects and their relationships. The procedure for developing the crop-pest

ontology was as follows:

* Determination of the purpose and domain of the crop-pest ontology
* Consideration of the reuse of existing ontologies
* Enumeration of important terms in the crop-pest ontology
* Determination of the classes and the class hierarchy
* Definition of properties
* Creation of individuals









According to the procedure, the crop-pest ontology was developed. It contains 286

classes, 81 object properties, 36 datatype properties, 305 individuals. The crop-pest

ontology is written in OWL.

The validation of the crop-pest ontology was essential to find unsatisfiable concepts

in the ontology. The Pellet OWL reasoner was used to validate the crop-pest ontology.

The consistency of the crop-pest ontology was revealed as true. Evaluation as well as

validation of the crop-pest ontology was important because in addition to supporting their

original applications by functioning properly, complete ontologies also can be re-used for

the development of other ontologies. The data-driven approach for evaluation of the crop-

pest ontology showed that the crop-pest ontology covers a domain-specific corpus well

when compared with AGROVOC, a well-known ontology that covers all agricultural

domains. However, the data-driven approach has some limitations compared with the

qualitative approach, which checks the definition of classes, properties, and individuals

by ontology developers. The data-driven approach cannot check the logical correctness of

the definition of classes, properties, and individuals. An OWL reasoner could provide

logical correctness, only if the reasoner supports to reason all classes, properties, and

individuals. For now, the Pellet OWL reasoner could support consistency checks and

some query processes. An evaluation could be designed according to the purpose of the

crop-pest ontology. The purpose of the crop-pest ontology is browsing 291 images in a

collection. If users agree that browsing images associated with the crop-pest ontology is

more convenient than conventional ways of browsing images, which could be another

evaluation of the crop-pest ontology as well.














CHAPTER 3
A PRACTICAL COMPARISON BETWEEN THESAURUS AND ONTOLOGY
TECHNIQUES AS A BASIS FOR SEARCH IMPROVEMENT

Introduction

Jacob introduced the claim that a controlled vocabulary 3 is itself an ontology, so

long as the standard concept of a controlled vocabulary are similarly redefined (Jacob

2003). Chun pointed out that both thesauri and ontologies have common traits: describing

domain-specific knowledge; containing terms (or concepts) and relations among those

terms; making use of hierarchical structures; being used in information management

applications to catalog and retrieve information; and needing to be maintained and

revised constantly. Yet he also stressed that they are not the same, for a thesaurus has a

limited number of relations, which can result in relatively meager expressiveness of

representing specific knowledge (Chun 2004). Another difference is the two systems

having different points of emphasis. Whereas ontology builders are primarily concerned

with how software and associated machines interact with ontologies in a logical way,

thesaurus developers (such as librarians) instead focus on how users retrieve information

solely with the aid of a thesaurus (Jibbajabba 2002). The limited number of relations and

the different points of emphasis do not cover all differences between thesauri and

ontologies, most notably omitting differences based on the characteristics of languages

describing domain knowledge. In this paper, we explore additional differences between

thesauri and ontologies, not only describing the differences themselves but also providing


3 A controlled vocabulary is the same as a thesaurus.









specific examples that explicitly reveal just how thesauri are not ontologies. We have

selected the National Agricultural Library Thesaurus (NALT) as our specific thesaurus

of study, because it covers agricultural domains and has performed well in the past as a

controlled vocabulary in several well-known agricultural information systems. Likewise,

we have developed an ontology which covers crops and related insects, hereafter referred

to as the "crop-pest ontology," as a practical domain-specific ontology. We then

performed a practical comparison between NALT and the crop-pest ontology.

NALT will be described further. The crop-pest ontology was introduced in the

chapter 2. The mechanics of the practical comparison and offer results of the comparison

process will be introduced. The conclusion of the comparison will be shown. The

respective abilities of NALT and the crop-pest ontology associated with respects to

agricultural information systems will be discussed.

The National Agricultural Library Thesaurus

The National Agricultural Library Thesaurus (NALT) is intended for indexing

materials and for aiding retrieval in agricultural information systems. The thesaurus was

prepared by staff of the National Agricultural Library (NAL) to meet the needs of the

United States Department of Agriculture (USDA) and the Agricultural Research Service

(ARS) for an agricultural thesaurus (NALT 2005). NALT is the controlled vocabulary of

NAL's bibliographic database of citations to agricultural resources, known as

AGRICOLA. The Food Safety Research Information Office (FSRIO) and the

Agricultural Network Information Center (AgNIC) use NALT as the controlled

vocabulary of their information system. NALT is also used for browsing within the ARS

and AgNIC web sites (NALT 2005).









NALT is structured into 17 subject categories. These categories are derived from

the NAL Agricultural Classification Prototype, originally developed for the Agricultural

Information Network. 4 The subject scope of agriculture is broadly defined in NALT and

includes terminology related to the supporting biological, physical and social sciences.

Relationships between Terms

NALT includes hierarchical, equivalence and associative relationships among

concepts. Hierarchical relationships are indicated by Broader Term and Narrower Term

designations in the thesaurus. This hierarchical structure of relationships is a

distinguishing feature of the thesaurus, in contrast to a simple list of alphabetically

ordered terms. Broader terms represent more general concepts than narrower terms:

* Crop yield
* Broader Terms: crop production, yields
* Narrower Terms: grain yield, yield components

"Grain yield" is subordinate to "crop yield" since it is a more specific type of crop

yield. Similarly, "crop yield" belongs to a larger concept class of "yields." This

relationship suggests that if a searcher is interested in "crop yield" then they would also

be interested in specific types of crop yields such as "grain yield" and "yield

components." Equivalence relationships are designated by Use and Usedfor cross-

references. Equivalence is made when two or more terms represent the same (or nearly

the same) concept, e.g. synonymous terms, common names of organisms and their

scientific names, spelling variants, usage variants, and acronyms:

* Mechanical damage
* Use for: mechanical injuries


4 1111)p .agnic.org










As shown here, "mechanical injuries" is a synonymous term for "mechanical

damage." The reciprocal relationships appear as follows:

* Beetles
* Use: coleoptera

Here, "coleoptera" is the scientific name or preferred term, while "beetles" serves

as a common name to help direct users to a more appropriate term for indexing and

retrieval purposes. In general, NALT directs users from non-preferred terms to the more

appropriate descriptors for indexing and retrieval using its "Use" and "Used for"

designations. Associative relationships are designated by Related Terms reciprocal

relationships. An associative relationship is made between terms that are conceptually

related but are neither hierarchical nor equivalence relationships in nature. Associative

relationships serve to alert indexers and searchers that there are other related concepts in

the thesaurus that may be of interest to them:

* Insects
* Related Terms: insecticides

Here, "insecticides" is a related concept to "insects," because insecticides are

chemical substances used to kill insects.

Comparative Analysis

In this section, we make a comparative analysis between NALT and the crop-pest

ontology to further analyze their differences. The main points of comparison are the

representation of domain knowledge and faculty of reasoning based on the data

representation itself. The representation of domain knowledge is a crucial feature for both

systems, because each should extract domain knowledge and describe it using

components such as concepts, terms, relations, or properties. We compare NALT and the









crop-pest ontology by showing ways to represent knowledge and then testing their

abilities to describe knowledge of domain concepts. Then, we examine reasoning

facilities within each of these technologies. Reasoning is the use of logical expressions by

agents such as humans or machines to find results or draw conclusions (Encarta 2005).

Whenever agents can reach and process well-structured knowledge, the agents alone can

make logical inferences and deduce conclusions based on the existing well-structured

knowledge. We illustrate specific applications of reasoning such as ontology validation

and search.

NALT covers all of agriculture including crops and related insects. But because the

ontology only covers crops and associated pests, comparative analysis was performed

using only crops and associated pests, called the "crop-pest domain." 631 concepts were

generated from the crop-pest ontology database for the comparative analysis. Some of

41,000 preferred terms (terms for indexing and searching) and related terms (terms which

have relationships with those preferred terms) covering the crop-pest domain from NALT

were examined during this comparison.

Representing Domain Knowledge

Concepts, Semantic Relationships, and Their Logical Consistency

We explored how to best represent domain knowledge using concepts and semantic

relationships in both NALT and the crop-pest ontology. We selected a particular concept

Plant, which is the basic concept in our selected domain of focus (the crop-pest domain),

and proceeded to examine the concept Plant and its relationships to other concepts on the

basis of logical consistency.

In NALT, the term plants is represented based on the relationships including

broader term (BT), narrower term (NT), and related term (RT) as shown in Figure 3- 1.









plants
BT: organisms
NT algae and seaweeds
alpine plants


$C10lS
scion1s

wild flo ei.-
wild plants
woody plants

RT: autotrophs
flora
Figure 3-1. Description of a term plants in NALT ("..." denotes additional terms not
shown)
The crop-pest ontology, however, has a wider variety of types of relationships

between concepts. Superclass and Subclass terms, for instance, indicate a hierarchical

strlcture of a particular concept. In addition, properties represent various relationships

between concepts or between concepts and data values (such as strings and integers).

Representation of a particular concept Plant is shown in Figure 2, using the semantic

structure of the crop-pest ontology.

There are several highly visible differences in the representations of the same

concept Plant between NALT and the crop-pest ontology. First, formality of language in

the ontology provides machine-readable information. But, informality of NALT results in

a machine not being able to readily process information within the thesuarus itself.

Second, subclasses in the ontology should be logically defined using inherited "necessary

and sufficient (if possible) conditions" from its superclass, as compared to the assertion

of BT/NT relationships in NALT. In Figure 3-2, plant is a specific case of organism,

meaning plant should be logically satisfied using its inherited conditions, such as the

characteristics of organism as a living thing (replication, metabolism, etc.).













a photosynthetic organism that
has cellulose cell walls, cannot move of its own accord, grows on the earth or in water,
and usually has green leaves. Kingdom Plantae






































called is_elementof in the crop-pest ontology has limitations of its values: domain,

range can further reveal the logical consistency of stored knowledge concepts. These

limitations give the crop-pest ontology the additional ability of logic-based consistency

validation.

The Ability to Represent Complicated Concepts

In trying to represent knowledge in the crop-pest domain, thesaurus or ontology

developers would be confronted with a description difficulty. For example, a concept

"large corn earworm larva on peanut leaf" is a relatively complicated concept to

represent. Here we explore the ability to represent complicated concepts in NALT and the

crop-pest ontology through a process of creating the concept "large corn earworm larva

on peanut leaf." To begin with, we state explicit meanings of the given concept:

* The size of corn earworm is large.
* The developmental stage of corn earworm is a larval stage.
* This corn earworm is located in a peanut leaf.

Then, we examine how to describe these meanings in both NALT and the crop-pest

ontology. In NALT, terms related to the crop-pest domain are selected from the given











concept, such as corn earworm, larva, peanut, and leaf For example, peanut and larva


would be terms as shown in Figure 3-3.


Corn earworm
Use:
Helicoverpa zea
Related Terms:
Larva
peanut
leaf

peanut

Used for:
ground nuts (British) *
groundnuts (British) *

Broader Terms:
lesumes
nuts
oi Iseeds
peanut products

Related Terms:
Arachis hypogaea


larva

Used for:
caterpillars *
grubs 6
insect larvae

Broader Terms:
developmental stages

Narrower Terms:
chiggers
fish larvae
microtilariae
nematode larvae
schistosomula
sporocysts (Trematoda)

Related Terms:
larval development
larvicides


Figure 3-3. Three terms, corn earworm, peanut and larva, in the NALT thesaurus

During the process of selection of terms from the given concepts, explicit meanings


within the concept would be absent. For example, we cannot yet determine how big larva


is or where the larva is located.


or where the larva is located.









Understanding the words in the example phrase is strongly related to syntactic and

semantic understanding of them, which means identifying which parts are main words in

a given phrase and which parts are its modifying words. In this example, we found main

words (i.e. head) in the given concept, "large corn earworm larva on peanut leaf," as

being "corn earworm." The head of "corn earworm" was therefore represented as a class

in the crop-pest ontology. The other words, called modifiers, modified the head of "corn

earworm." These modifiers of the corn earworm were created as properties and their

domain and range, as shown in Figure 3-4.

Properties Domain Range
Size of Insect concepts that indicate sizes such as large,
medium, and small
Development stage Insect concepts that indicate developmental stages
of of insects such as egg, larva, pupa, and
adult
Locate in Insect Plant or Part of Plant
Figure 3-4. Properties assigned a class corn earworm for describing the concept "large
corn earworm larva on peanut leaf"
Then, an individual of the class "corn earworm" called "corn earworm eggl" was

created, with these properties filled out as in Figure 3-5. Finally, we formally generated

the given concept "large corn earworm larva on peanut leaf" using the individual of a

class "corn earworm" in the crop-pest ontology without any loss of the explicit meanings.

Individual(a: corn_earworm_egg 1
type(a: corn_earworm_egg)
value(a:ispest of a:peanutl)
value(a:locate In a:peanutl)
value(a:has size a:largel)
value(a:has developmental stage a:eggl))
Individual(a: egg
type(a:egg))
Individual(a:largel
type(a:large))
Individual(a:peanut 1
type(a:peanut)
value(a:is host of a:corn earworm eggl))









Figure 3-5. The OWL abstract form for the individual of a class "corn earworm" to
represent a complicated concept "large corn earworm larva on peanut leaf' 6

Reasoning Based on Representation

Reasoning Facilities

In the NALT thesaurus, the relationship between BT and NT could be a simple

inference based on generalization and specification. In Figure 3-1, plants is treated as a

specific organism with regard to BT. Algae and seaweeds is likewise treated as a specific

plant based on NT, if NALT thesaurus can be converted into a formal language and

BT/NT/RT is consistent.. In the crop-pest ontology, however, the ability to process

reasoning is well beyond generalization and specification. A concept beetle in the crop-

pest ontology inherits all properties and associations from its superclass insect, based on

generalization/specification rules. Here, we explain an inference with the following

example from the crop-pest ontology. We first assert three true propositions in the crop-

pest domain:

* Corn earworm is an insect that can damage peanut plants.
* An insect is an agent.
* Peanut pest is an agent that can damage peanut plants.

Based on these three propositions, we can infer that corn earworm is also one of the

peanut pests, using existential quantification. Following are the steps to perform the same

inference in the crop-pest ontology.

We define a class corn earworm asserting the first proposition (See Figure 3-6, Part

A). A class agent is a union of organisms such as insect, bacterium or fungus and non-

organisms such as virus or prion that causes events (See Figure 3-6, Part B). Through the

hierarchy of classes, we show corn earworm is an insect, invertebrate, animal, organism

6 "a" denotes a particular namespace, at http://www.owl-ontologies.com/unnamed.owl










and agent, by a process of reasoning. A class peanutpest is defined as an intersection of

an agent and its property damage to peanut (See Figure 3-6, Part C). Therefore, we

deduce that a class corn earworm is a subclass of peanut pest automatically, as shown as

Figure 3-6, Part D.


A)Class(a:corn_earworm partial a:insect)

Obj ectProperty(a:damage to_peanut
domain(a: corn_earworm)
range(a:peanut))

B) Class(a:agent complete
unionOf(a:non_organism a:organism))

ObjectProperty(a:make_eventof
domain(a:agent)
range(a: event))

Class(a:animal partial a:organism)
Class(a:invertebrate partial a:animal)
Class(a:insect partial a:invertebrate)

C) Class(a:peanut_pest complete
intersectionOf(restriction(a:damage to_peanut someValuesFrom(a:peanut)) a:agent))


insect



Corn
earwonu


inference




Damage to peanut


iDamageto peanut


Figure 3-6. The OWL abstract form for an inference "corn earworm is a peanut pest". A,
B, and C show how the ontology describes propositions in the domain. D is a
diagram of the result of the inference. A circle indicates a class; a line
indicates relationships between classes and subclasses; a dotted line denotes a









property of two classes; and a red line indicates the peanut pest has a new corn
earworm, as the result of the inference.

In addition to the above examples, many other logical inferences can be made from

entries in the crop-pest ontology.

Searching Documents

The NALT thesaurus brings more relevant documents into a search, providing an

overall query expansion. A keyword from end-users first is matched into NALT to find

the relevant terms such as those from Usedfor. Once a certain relevant term is selected, it

is added to the keyword search of documents in a publication. This query expansion can

result in more relevant documents being returned to end-users and leads to improved

search ability, but the expansion can also bring in more irrelevant documents. In addition,

browsing the controlled vocabulary in NALT provides clues for finding relevant

documents to users lacking an exact keyword.

In the crop-pest ontology, the searching process is a reasoning process. Whenever

end-users search information using keywords within the crop-pest ontology, the ontology

in turn executes a reasoning process to find answers. It can produce better results because

the ontology brings relevant information using knowledge not only asserted manually by

experts (like NALT does) but also inferred automatically. For example, we detailed a

search to find "peanut pest" concurrently in NALT and the crop-pest ontology. As Figure

3-7 explains, we found a preferred term of "plant pest" in NALT that is the closest known

term to "peanut pest." Once experts asserted peanut pests in NALT, the thesaurus then

provides relevant terms for the search and brings further relevant information.

We showed how the ontology deduces that corn earworm is a peanut pest. Based on

a similar method of deduction, we can get information on all peanut pests, using












deductions for finding all peanut pests as well as any manual assertion such as "A is a


peanut pest."



plant pests

Used for:
oests of plants
Used for AND type:
hosts of plant pasts *
Broader Terms:
pests
plant health
Related Terms:
birds
botany
crop entomology
disease and pest management
forest health
forest insects
insect pests
insecticides
mites
mollusc control
molluscicides
molluscs
neat icides
nematodes
nematology
plant parasitic nematodes
plant pathogens
plant protection
slugs
snaiIs

Figure 3-7. Screen shot of the preferred term "plant pests" in NALT


Automatic Validation of Logical Consistency


The validation of NALT and the crop-pest ontology is one of the most important


issues in providing better applications of searching information and cataloguing


documents. If the thesaurus or the ontology is not established to be logically correct, we


can not expect a good result from any application that includes the thesaurus or the


ontology.


In NALT, the validation of all terms and relationships is executed by field experts'


points of view. For example, throughout the development of the first edition of the NALT


thesaurus, Agricultural Research Service (ARS 2005) scientists and specialists in the field


of agriculture manually conducted its validation.









43



In the crop-pest ontology, all classes, properties, domains, ranges and individuals


were authenticated by experts. In addition, the deductive ability within the crop-pest


ontology can itself assist in the validation process. This deduction is a product of


inference executed by a machine. So, once the crop-pest ontology is developed, we can


automatically validate the ontology by checking logically correct deductions according to


components of the ontology such as classes and properties. Automatic validation of


classes and properties was executed by Pellet, which is a reasoner built in Java that was


designed specifically for OWL reasoning (Pellet 2005). It checked the crop-pest ontology


consistency and reported any unsatisfied classes within it, as demonstrated in Figure 3-8.


A.
OWL Consistency Checker
Enter a URI or paste an OWL document into the following text field to check the consistency of an OWL ontology. The level of the input file (Lite, DL or Full) will also be shown.
Additionally, you can specify a URI of another file to check if all the triples in this conclusions file is entailed from the triples in input file. The classification of all the concepts in
the input file can also be printed in a tree or table format. Please send your comments, questions and problems to Pellet users list. Before posting messages, you need to first
subscribe to the mailing list.

Examples: 2) Find ontology consistency Check if the ontology is consistentIv I -11 1
Input: Enter a URI (or Text) for an OWL ontology to check for consistency and optionally display the classification

URI: RDF/XML v Submit
or Text: I










Options: E Check ontology consistency
I Find unsatisfiable concepts
E Display class hierachy NOD
E Show Individuals In class hierachy
O Enable processing E-connected ontologies
B.

Results

Input file: Text area
OWL Species: Full
DL Expressivity: ALCRH
Consistent: Yes
Time: 1907 Ins (Loading: 1644 Preprocessing: 0 Species Validation: 251 Consistency: 12)

Figure 3-8. Screen shots of OWL consistency checker and the results in Pellet. A shows
the front page of the OWL consistency checker. B shows consistency of the
crop-pest ontology.











Conclusion

The difference between a thesaurus and an ontology were explored, through a

practical comparison between NALT and the crop-pest ontology. The fundamental

differences of representing domain knowledge between them were the formality of

language in the crop-pest ontology; the logical consistency of concepts and relationships

in the crop-pest ontology; and the explicit description of in NALT. Formality in the

ontology allowed a machine to more readily process information within the ontology

itself.

Some previous research of converting thesauri into formal languages such as

Resource Description Framework (RDF) have been studied, though they reported several

conversion problems based on relations in the thesauri (Matthews 2002). Logical

consistency of concepts and properties brought with it higher faculties of automated

reasoning. Ambiguity of relations in the thesaurus was analyzed, in particular those

relationships such as broader term (BT) and narrower term (NT), which can inadvertently

be used in an ambiguous fashion (e.g. a particular concept is a special case of another

concept, or that a particular concept is part of yet another concept).

The differences between the representations of data in both technologies could

bring about a different level of power of reasoning within their applications. In the NALT

thesaurus, the relationships such as BT or NT become a simple inference based on

generalization and specification, assuming the underlying relationships is valid. In the

crop-pest ontology, however, the ability to process reasoning is well beyond

generalization and specification. It supports the deduction of a conclusion based on true

propositions, the search of information as a result of inference, and the automatic

validation of logical consistency in the ontology. We conclude that of the two studied






45


systems, an ontology provides the better representation of domain knowledge and a

greater power of reasoning based on the underlying representation, which could improve

searching documents in agricultural publications.

















CHAPTER 4
BUILDING A DATABASE AND GRAPHICAL USER INTERFACE FOR
BROWSING IMAGES BASED ON THE CROP-PEST ONTOLOGY

Introduction

The most common interface for image retrieval is one that allows users to type

keywords and see search results in a table ordered by relevance. This type of interface,

such as Google Image Search (Google Image Search, 2005), performs well with a large

pool of Web images, but it does not allow users to browse images (Yee, 2003). Markkula

reported that professionals in artistic fields such as journalism, design, and art direction

use browsing as a basic strategy in searching for images. The first reason is that browsing

aids in the development of illustration ideas. The second reason is that some words

describing selected images may be difficult to express freely as search keywords but are

easily applied when the images are seen. The third reason is that image selection depends

on a particular work situation, which is difficult to anticipate in indexing. The fourth

reason is that artistic professionals feel comfortable with browsing. Similarly, growers

and other professionals in the agricultural field would want to be able to browse images

as well as search for them with keywords.

Indexing images is vitally important to image browsing. One commonly used

approach is using a thesaurus to index images (Hyvonen, 2004). An image can be

categorized by a thesaurus that classifies different aspects of images into hierarchical

categories. However, a thesaurus turns out to provide only part of the knowledge needed,









when a knowledge-rich description of images is required for indexing them. Wielinga

stressed that the structured knowledge-based description of images is much richer than

the traditional "set of terms" like a thesaurus (Wielinga, 2001). As shown in the chapter

3, an ontology could improve the deficiencies of a thesaurus in describing domain

knowledge by using a formal representation of concepts.

Hyvonen implemented an ontology-based image retrieval system for a photo

exhibition using the promotion image database of the Helsinki University Museum

(Hyvonen, 2003). In the system, images were annotated according to a promotion

ontology. The ontology helped users formulate their queries. Schreiber reported a system

of ontology-based photo indexing and searching in an image collection about apes,

including chimpanzees, gorillas, and orangutans (Schreiber, 2001). He developed a

domain ontology for annotating ape photographs.

As shown in Chapter 2, the crop-pest ontology has been developed for browsing

291 images in a collection. The ontology can be used as a tool for cataloging those

images. This chapter describes how images were indexed using the crop-pest ontology.

The implement of a graphical interface for browsing the images will be introduced. A

usability study will also be presented .

Ontology-Based Image Indexing

Several studies have been done on using ontologies as tools for indexing

information. Desmontils explored indexing Web pages using a terminology-oriented

ontology. He presented a semi-automatic process for indexing a Web site associated with

the ontology (Desmontils, 2002). He insisted that an ontology-based indexing approach

could provide more precise retrieval within a given Web sites. Tsinaraki indexed

audiovisual information such as images and videos associated with an ontology.









Tsinaraki suggested that indexing multimedia information using the same ontologies

across different multimedia standards had the advantage of interoperability across

applications (Tsinaraki, 2005). In addition, Tsinaraki pointed out the most interesting

aspect of ontology-based indexing, which is that the approach provides not only simple

retrieval of audiovisual content with a simple keyword query but also enhanced content-

based retrieval with semantic queries such as "give me video clips where the players

Ronald or Beckham appear" for audiovisual content through the use of domain-specific

ontologies for the knowledge domain. Since user requests within the agricultural fields

involve complex concepts as well as simple concepts based on domain knowledge, this

ontology-based image indexing is a promising approach to fulfill user needs.

Creating Concepts for Indexing Images in the Crop-Pest Ontology

New classes, properties and instances were added into the crop-pest ontology to

index 291 images associated with the ontology. First, the class "digital photograph" was

created in the crop-pest ontology because each real image became an instance of the class

"digital photograph." The class get common properties from upper classes: "thing,"

"physical," "object," "self-connected object," "substance," and "photograph: Figure 4-1

shows the hierarchy from the root class "thing" down to "digital photograph." The class

"digital photograph" has several subclasses, such as "pest photograph," shown in Figure

4-1.

Second, several properties of this "photograph" class were assigned. After manual

analysis of 291 image captions, four common relationships were detected. These

relationships were assigned into four properties of the "digital photograph" class:

* damage: any process caused by an agent
* agent (pest): insects that cause damage









* host: plants that are attacked by insects
* location: the place where plants or insects are located

Third, each image was defined as an individual of the class "digital photograph." Based

on the content of each image, values of each property for each individual were specified

during the process of indexing images, as explained in the next section.

The Indexing Process

This process builds a structured index of images according to the crop-pest

ontology. The indexing process can be divided into five steps, shown in Figure 4-2.

Step 1: Syntactic and Semantic Analysis of Each Image Caption

All 291 images have image captions that describe the content of images according

to domain experts. The image captions, shown in A, were analyzed both syntactically and

semantically. Syntactic analysis is based on the grammatical structure of the captions, and

semantic analysis is based on the meaning of each word appearing in a caption and the

relationships among word meaning established by the syntax. These analyses extract

domain knowledge implied by words or phrases in the image captions as well as the

content of the images. For example, the image caption shown in Figure 4-2, "damage

caused by three cornered alfalfa hopper on soybean," was analyzed syntactically and

semantically.

As shown in Figure 4-3, the word "damage" is a head, or the main word in the phrase.

Two modifiers that qualify the meaning of the head follow it: One is the past-participial

phrase "caused by three cornered alfalfa hopper," used like an adjective phrase, and the

other is the prepositional phrase "on soybean." The semantic analysis was based on the

syntactic analysis. The first modifier indicates an agent that causes damage, the specific

pest "three cornered alfalfa hopper." The second modifier describes not only the location









of the damage, which is "soybean," but also implies the host of the pest "three cornered

alfalfa hopper" because this content of image is restricted to the crop-pest domain. These

syntactic and semantic analyses of the given image caption provided information

necessary to index the image into the crop-pest ontology.

Step 2: Creating an Individual of the Image in the Crop-Pest Ontology

Each image becomes an individual of class "digital photograph" in the crop-pest

ontology because each image was a real and specific example of the class "digital

photograph." For example, the image with the caption "damage caused by three cornered

alfalfa hopper on soybean" was created as an individual of the class "digital photograph."

The individual called "Image SOY006" represents the image with the caption "damage

caused by three cornered alfalfa hopper on soybean."

Step 3: Filling in the Values of Properties for Each Individual

Each individual created in Step 2 has four properties that must satisfy the properties

of the class "digital photograph." The values of those four properties were filled out with

information provided by the syntactic and semantic analyses. For example, the individual

in Step 2, "Image SOY006," had the following four properties, to which values were

assigned in the following way:

* damage: damage
* agent (pest): three cornered alfalfa hopper
* host: soybean
* location: soybean

Step 4: Connecting Assigned Values into Classes/Individuals in the crop-pest
ontology

Values assigned to the four properties for each individual were connected with

classes and individuals in the crop-pest ontology. This mapping of property values onto

the crop-pest ontology provides access to domain knowledge while browsing through









classes/individuals within the ontology. The individual "Image SOY006," for example,

has "three cornered alfalfa hopper" as the value of the property "agent." The value "three

cornered alfalfa hopper" was mapped into the class "three cornered alfalfa hopper" in the

crop-pest ontology. The class then showed 1) what it is and 2) what kinds of relationships

there are among other classes or individuals, exposing the domain knowledge. Also the

soybean value was mapped to the soybean class.

Step 5: Saving the Individual into the Crop-Pest Ontology

The index of each image from Step 1 through Step 4 was saved into part of the

crop-pest ontology as an individual.

An Interface for Browsing Images

Goals

The first goal was to provide an interface to growers, county agents, and other

agricultural users that would improve their ability to locate images compared to keyword

based search interface. The users do not necessarily know what keyword to type to find

images. This keyword-based approach is useful, when the users are familiar with what

kinds of images are in a collection and/or when the user know what kind concepts are

used to retrieve relevant images. A facility to browse can help users who are not familiar

with a certain image collection to retrieve images easily and effectively, even though

users must carefully guide themselves using textual or graphical indications of the

content reachable via a link (Olston, 2003).

The second goal was to provide the users with the ability to acquire domain

knowledge described by the crop-pest ontology while browsing concepts or images. A

facility to show the concepts relationships for the domain of an image collection to users









can address the limitations in keyword-based interfaces that do not provide such domain

knowledge

Features to be Supported

The graphical user interface has several features and requirements. First, a facility

for browsing images is required in this interface for users who are not familiar with the

image collection or who have lack of domain knowledge. Second, a facility to support

visualization of a hierarchical structure of classes in the crop-pest ontology is required.

Classes in the crop-pest ontology are arranged in a hierarchical structure. Therefore, the

interface should support the visualization of hierarchy of classes in the crop-pest

ontology. Third, a facility to show properties of the crop-pest ontology is required.

Properties that show relationships between individuals or between individuals and data

values should be shown to users. Fourth, a facility to show relationships of each image

with concepts in the crop-pest ontology is required. Each image has relationships to other

concepts in the crop-pest ontology. Therefore, the interface should provide concepts

related to each image. Fifth, a facility to find all images related to a particular class in the

crop-pest ontology is required.

The Graphical Interface

All features to be required to achieve these goals were implemented in the

interface. First, the facility to browse images was implemented by modifying

TouchGraph, an open source program for data visualization (TouchGraph, 2005). The

interface is shown in the figure 4-4. The facility to browse images is shown in the center

of the interface. The display contains a graph with nodes including a rectangular, an oval,

or a thumbnail of an image; a rectangular node means a class in the crop-pest ontology

and an oval node indicates an individual in the ontology. Properties are displayed as









edges shown in the figure 4-4 which are gray lines connecting nodes. When a user click a

node, the neighboring nodes are expanded or hidden, enabling users to control the range

of nodes to be seen. Users can browse images following to the expansion of nodes shown

in figure 4-5. The facility to support visualization of a hierarchical structure of classes in

the crop-pest ontology was implemented using colors and shapes of nodes. Classes in the

crop-pest ontology were represented by rectangular nodes. Each rectangular node can

have three colors. A rectangular in orange represents the selected class. A rectangular in

blue is a superclass. A rectangular in green is a subclass. Since the interface is for

growers and other professionals who might not be familiar with components of the crop-

pest ontology, the interface avoids using technical terminology of the ontology such as

class, subclass, and superclass. The meaning of colors ("current selected: orange, more

general: blue more specific: green") is defined in the interface in Figure 4-5.

The facility to show properties of the crop-pest ontology was implemented using

edges. There are two edges: One is an edge without any label and the other is an edge

with a label. An edge with a label represents a property in the crop-pest ontology shown

in the figure 4-6. The thick end of each edge with a label indicates the domain of a

property and the thin end of it shows the range of the property. For example, in the figure

4-6, a property called "is pest of' is shown the edge with the label "is pest of' in black.

The domain of the property is a class "insect" and the range of the property is a class

"plant". The thick end of the edge in black indicates the domain, the class "insect", and

the thin end of the edge indicates the range represents the range of the property. An

example, "is pest of' is highlighted in black.









A facility to show relationships of each image with concepts in the crop-pest

ontology was implemented. When an image is selected, all related concepts with the

image are shown. The facility provides background knowledge of the image as well as

content. In addition, users can expand their retrieval experience in the image collection

by following related concepts. For example, related concepts including bean leaf beetle,

damage, soybean and related properties are shown, when an image pointed by the black

arrow in the figure 4-7.

A facility to find all images related to a particular class in the crop-pest ontology

was implemented. In this image collection, users might want to see all images related a

particular class without browsing nodes. This facility shows all images related a

particular class, given that the user enters the term for that class. For example, in the

figure 4-8, a term from the input box located in the left most top of the screenshot "stink

bug" was matched into a class "stink bug" in the crop-pest ontology by comparing the

given term with names of classes. When users click a button "show all images about stink

bug" located in the right most top of the interface, a pop-up window shows all images

that were manually assigned into individuals of class "stink bug" and its subclasses.

Usability Study

The graphical interface was designed for growers, county agents, and other

agricultural users to improve their ability to locate images compared to keyword based

search interface. In addition, this interface was developed in order to provide the users

with the ability to acquire domain knowledge by browsing the crop-pest ontology while

searching for images. The usability study was planned to evaluate this interface based on

these two objectives.









The Keyword-Based Search Interface

To compare this graphical interface with the conventional keyword-based search

interface, a keyword-based search interface called "baseline interface" was built using the

Egothor (Egothor, 2005), an open source full-featured text search engine written entirely

in Java. 291 web pages were generated for indexing those images. Each Web page

contains an image file (.jpg) and its image captions as text. Figure 4-9 shows the

keyword-based interface and a web page containing an image and its caption. The main

page for the baseline interface provides an entry input box for typing in search

keyword(s) and one paragraph about the description of the domain. After users enter

keyword(s), a linked list of Web pages of search results is shown. When users click a

Web page in the list, the Web page shows an image and its caption.

Hypotheses to Be Tested

Based on the two goals of this interface, two hypotheses were tested.

* Hypothesis 1: The graphical interface will produce higher precision and recall than
the baseline interface.

* Hypothesis 2 : The graphical interface will help users to learn more about the
crops, pests, and relationships between them

Design and Procedure

The hypothesis 1 was tested by evaluating precision and recall of the graphical

interface and the keyword-based image interface. The process of this evaluation was

carried out in three stages. In the first stage, search terms used to determine precision and

recall were drawn. 134 terms were collected from terms that users typed in to search

documents in an agricultural information system called Electronic Data Information

Source (EDIS). Based on the range of the image database covered by both interfaces,

seven search terms were drawn out of the 134 search terms. The selected terms were









shown in the table 4-1. In the second stage, both interfaces were accessed for the selected

search terms. Finally, the data was analyzed for results.

The hypothesis 2 was tested by evaluating user satisfaction implemented using a

between-subjects design. Data from 20 participants was used in the analysis. 19

participants are in the Department of Agricultural and Biological Engineering in

University of Florida and one participant is in the Department of Horticulture in

University of Florida. The preliminary study was done with agricultural researchers

including 3 faculties, 1 staff, and 16 graduate students. The participants were all users of

the Internet, searching for information either everyday or a few times per week. They

searched for images online either every day or a few times per week. Each participant

used either the baseline interface or the graphical interface. Since there is no explanation

before using a interface in practice, participants were not introduced to the features of the

interface. But, the fact that they accessed the same image collection about crops and

related pests was introduced before starting the evaluation. Either the baseline interface

or the graphical interface was randomly assigned. Throughout the study, subjective

ratings were reported on a 4-point scale (Strongly agree, agree, disagree, strongly

disagree). A sample question is as following:

* Did this interface help you learn more about crops, pests, and relationships between
them?

After finishing the retrieval of images in each interface, participants completed an

evaluation. All questions to be asked are shown in the Appendix F.

Results

The precision and recall of seven search terms in both interfaces were shown in the

table 4-1. The statistical analysis for comparing two precision between the graphical









interface and the baseline interface was shown that precision of the graphical interface is

higher than that of the baseline interface(t = -2.11, p = .02). In addition, recall of the

graphical interface is higher than that of the baseline interface (t = -2.10, p = .02).

Therefore hypothesis 1 was accepted.

The test of hypothesis 2 was done by the analysis of response on the question "Did this

interface help you learn more about crops, pests, and relationships between them?". The

table 4-2 shows individual response on the question. The statistical analysis shows there

is no significant between two interfaces (t = 2.9p = .46) and the hypothesis 2 was

rejected. However, the advanced evaluation would be needed, since responses from

participants indicate some difference. The design of this evaluation is based on between-

subjects design. This design treats the difference of subjects (i.e. difference of

participants) as an error. However, the within-subjects design can deal with the difference

of subjects as a variable. In addition, the size of participants is five, which is small.

Moreover, the subject ratings based on 4 -point Likert scale might be not sensitive to get

the precise response from participants. Therefore, the advanced evaluation would be

considered using 1) the within-subjects design, 2) bigger size of participants and 3) a

wide range of subject ratings.



Conclusion

Based on the crop-pest ontology, the ontology-based image indexing was done. The

new class "digital photograph" was created in the crop-pest ontology to index 291

images. Several properties related with "digital photograph" were assigned. The process

of indexing images is as following:

* Syntactic and semantic analysis of each image caption









* Creating an individual of the image in the crop-pest ontology
* Filling in the values of properties for each individual
* Connecting assigned values into individuals in the crop-pest ontology
* Saving the individuals into the crop-pest ontology

After indexing 291 images, a graphical interface for browsing those images was

designed to provide a tool to growers, county agents, and other agricultural users that

would improve their ability to located images comparing to keyword-based search

interface. In addition, this interface was aimed to provide the users with the ability to

acquire domain knowledge described by the crop-pest ontology. The graphical interface

enables users:

* To browse images

* To support visualization of a hierarchical structure of classes in the crop-pest
ontology

* To show properties of the crop-pest ontology

* To show relationships of each image with concepts in the crop-pest ontology

* To find all images related to a particular class

The preliminary usability study was done for this graphical interface. Data from 10

participants was used in this analysis. To compare conventional keyword-based search

interface, a keyword-based search interface was built. Hypotheses to be tested are as

following, based on the goals of this graphical interface:

* Hypothesis 1: The graphical interface will produce higher precision and recall than
the baseline interface..

* Hypothesis 2 : The graphical interface will help users to learn more about the
crops, pests, and relationships between them

To test hypothesis 1, seven search terms were drawn from the user inputs from the

EDIS and the precision and recall of the graphical interface were higher than those of the

baseline interface. Because that precision and recall represent the efficiency of searching









information, the higher precision and recall indicated that growers, county agents, and

other agricultural users improve their ability to locate images using the graphical

interface, compared to keyword based search interface.

To test hypothesis 2, usability study was implemented using between-subjects

design. Each participant used either the keyword-based search interface or the graphical

interface using the crop-pest ontology. Throughout the study, subjective ratings were

reported on a 4-point Likert scale. The question "Did this interface help you learn more

about crops, pests, and relationships between them?" was asked to participants. Eight

participants out of ten were that the graphical interface helped them learn more about

domain, while five participants out often using the keyword-based search interface

agreed with that and two participants strongly disagreed. The statistical analysis showed

the hypothesis 2 was rejected. But the responses from the participants indicated an

advanced study of the evaluation of the graphical interface. The advanced study would be

designed using 1) the within-subjects design, 2) bigger size of participants, 3) a wide

range of subject ratings, and 4) the retrieval of images by a task.










Table -1 Th eutofeautn n------rec i on an ecl nbthnefae
Search terms Graphical interface Baseline interface
Precision (%) Recall (%) Precision (%) Recall (%)
Aphid 0 0 100 50
Bug 0 0 100 58
Green stink bug 0 0 0 100
Laying egg 0 0 0 0
Plant 50 3 100 11.6
Stink bug 100 100 100 100
White fly 0 0 100 100

Table 4-2. Participant's responses about the question "Did this interface help you learn
more about crops pests, and relationships between them?"
Interface using keyword Interface using ontology
strongly agree 0 0
Agree 5 8
Disagree 3 1
strongly disagree 2 0
N/A 0 1
Total 10 10



Class: pest_photograph


photograph about insects

*owl:Thing
*p physical
*.objie t
*self connected object
*substance
*photograph
*digital photograph
*pest photograph

Figure 4-1. Hierarchy from the root class "thing" to "digital photograph."


Table 4-1 The reslllt of evalllati n B nreci si on and recall i n


both interfaces










Damage caused by
three-cornered alfalfa hopper
on soybean

Image caption

Syntactic and semantic analysis
on the image caption



Creating an instance
of class "digital photograph



Filling values of
properties
of the instance



Connecting the values into
other classesAinstances
inthe ontology


Saving the instance
into the ontology


Crop-pest ontology


An instance ofimage


Figure 4-2. Overview of indexing images associated with the crop-pest ontology.


A. Syntactic analysis

damage caused by three cornered alfalfa hopper on soybean

header Modifierl Modifier 2




B. Semantic analysis

damage
agent (pest): three cornered alfalfa hopper
location (host): soybean

Figure 4-3. Syntactical and semantic analysis of the image caption "damage caused by
three cornered alfalfa hopper on soybean." A shows the result of syntactic
analysis. B illustrates the result of semantic analysis.


O

0


C5.0

































Figure 4-4. The interface that provide a facility to browse 291 images. The browsing
facility is located in the center of the interface.







the explanation
about colors in nodes













n I
Figure 4-5. Expanding a node (orange rectangle) shows an image. The black arrow (not
part of the interface) points at the definition of node colors.


interface) points at the definition of node colors.


















-: I h. ',I I ..,



iI' '11 .11


I. .1 I.


'--I 1I1- -


.III ..of


, 1 I 1i I- ------i---
Ii q


Figure 4-6. A facility for showing properties. A property is shown by an edge with a

label. The thick end of each edge represents the domain. The thin end of each

edge


I- 'I I 'I


I--II I II I- I II
ii I' I
































-A' ee-lAI
,I:
IL Ah


r0 V .:
':.a.*


the selected image


Figure 4-7. The facility to show the selected images with related concepts in the crop-pest
ontology.









**....iI Ilnij-m I
am wn .
a button to show all images related with stink bug



=- B


tmSI.k samc


n FEran nh Lia


7r
a window that shows all images related with stink bug


Figure 4-8. The facility to show all images related with a particular class in the crop-pest
ontology


E


I-

iti

IL2


M. x


I


3- i-


I









A.The main page of the keyword-based search interface
; http://jawoo.com:808o0egothorlen/index.html
1Ie- i C| Search v 5 blocked Options


::egothor: Image search in crops and pests


stink bugl


Search


type a keyword(s) to retrieve images


:: Title page :: About:: Help page :: Language ::

Copyright 2005 Egothor Developers


B.Aweb page containing an image and its caption
Address ] http:/Ijawoo.com:8080/indexsrci/Image%20-%2050YO22.html
COOgle- CI Search 5 5 blocked


SOptions _"


. .- -6 -i -o '
Figure 4-9. The screenshot of keyword-based search interface implemented with Egothor.
A shows the main page of searching images B shows the Web page of an
image and its caption "stink bug nymphs on soybean pod". This web page was
used for indexing the image














CHAPTER 5
ONTOLOGY-ASSISTED INFORMATION EXTRACTION

Introduction

The indexing of image captions associated with the crop-pest ontology is a process

in which important information is extracted from the image captions based on syntactic

and semantic analysis and the information then saved into a structural form that enables a

reasoner to reach it and apply reasoning processes to it. Chapter 4 described the manual

indexing of image captions. This chapter introduces a semi-automatic approach to the

creation of indexes based on syntactic and semantic analysis of natural language. The

creation of an index can be redefined as the process of extracting relevant information

from text and building a formal semantic structure for it. The process is called

information extraction. Information extraction is a process that identifies useful

information from natural language text from a particular domain and converts that

information to a structured form which can be saved into a database (Cardie, 1997). In

ontology-assisted image retrieval, the structured form that results from the information

extraction associated with the ontology is created as an individual of class "image" inside

the crop-pest ontology. SMES is one information extraction system that uses domain-

specific knowledge from an ontology (Maedche, 2002). SMES maps free text into a

domain-specific ontology containing target knowledge structures about crucial

information for answering questions such as who, what, whom, when, where or why.

The target knowledge structures are predefined by a given ontology.









Manual information extraction incurs a high time and labor cost for syntactic and

semantic analysis because domain experts analyze the syntactic and semantic structure of

natural language text based on their domain knowledge to create the structure. The semi-

automatic approach suggested here is based on a natural language parser, phrase patterns,

and the crop-pest ontology. Parsing is defined as the process of analyzing a continuous

stream of input such as text in order to determine its grammatical structure with respect to

a given set of grammar rules. A parser is a computer program that carries out this task.

Phrase patterns provide syntactic and semantic information to the parser. The crop-pest

ontology is an ontology that describes crops and related pests in a formal way, as

discussed in the chapter 2. The three components function together to analyze natural

language text and create the semantic structure automatically. This approach is

considered to be semi-automatic rather than automatic since the phrase patterns were

created by humans and some manual validation of the automatically created structures is

required.

The organization of this chapter is as follows. The components of the information

extraction system will be introduced: 1) ontology, 2) phrase patterns and 3) island chart

parser. Semantic structures as the outputs of the system will be described as well. The

results from the information extraction system are presented and discussed.

Components in Information Extraction System

The information extraction system consists of 1) the crop-pest ontology as a source

of domain specific knowledge, 2) an island chart parser which parses given input strings

and converts them into semantic structures, and 3) phrase patterns which provide specific

grammar for the island chart parser (Figure 5-1). Inputs of the system are untagged

natural language text from image captions in the 291 images. Outputs of the system are









semantic structures that contain information about the input text associated with the crop-

pest ontology.

system


input
output
Island / Crop-pest ontology ot
DDIS chart r Semantic
captions structure
parser \
Phrase patterns





Figure 5-1. The information extraction system in the crop-pest domain. The semantic
structure can be classified as an individual of a particular class inside a
domain-specific ontology, mapped from the natural language text in a given
caption

Ontology

Knowledge of the crop-pest domain is a vital component for building an

information extraction system. Domain experts who have extensive experience can build

this knowledge. Since every domain expert has a different way of presenting their

knowledge, a formalized form is necessary. One such form is an ontology, as was

introduced in Chapter 2.

Phrase Patterns

Phrase patterns are domain-specific patterns that represent meaning of phrases (or a

word) as well as their syntactic structure. Phrase patterns contain relationships among

phrases that are specified as "ATTRIBUTE" and "HEAD." The HEAD is the main

concept appearing as a single word in the phrase being analyzed, and ATTRIBUTE is a

modifier of the HEAD. For example, to represent "cotton leaf," we can create a phrase

pattern such as . In the phrase









pattern, np is a category of noun phrase that can be resolved using other patterns for noun

phrases. In the case of"cotton leaf', each np in the pattern resolves to a single noun. The

phrase pattern also describes the semantic structure of of the phrase and the role of each

word in that structure as shown by the labels plant and plant part. The phrase pattern

explains a relationship between "cotton" and "leaf' with ATTRIBUTE and HEAD, since

a leaf is part of cotton. This phrase pattern can be applied for other phrases as well, such

as "cotton stem," "cotton root" and "peanut leaf." As shown in Figure 5-2, phrase

patterns are organized with a hierarchical structure. and become the

most specific phrase patterns in the hierarchy. The phrase pattern in the above example is

a child of the phrase pattern of , which is a child of .

These phrase patterns can be used a grammar. A grammar is a formal description

of the structures acceptable in a language (Allen, 1995). One type of grammar is context-

free grammar. Context-free grammar is roughly defined as a grammar in which the left

side of the rule has a single symbol. It is effective enough to express most of the

structures in natural language (Allen, 1995). However, this grammar is highly

ambiguous. Phrase patterns can lessen such ambiguity by using words in context specific

to a certain domain.







Figure 5-2. The hierarchical structure of phrase patterns for a concept "plant part". A
phrase pattern < n cotton ATTRIBUTE:cotton, n leaf HEAD> is more specific
pattern than , because a
cotton is a specific plant and a leaf is a specific plant part.










Island Chart Parser

A chart parser is a parser that utilizes of a set of grammatical rules, and dictionary

with each of the possible grammatical senses of each word indicated, and a data structure

called a "chart". A chart is a linear list of nodes and retains all edges. The chart parser

reads a phrase from the starting point -usually starting from the beginning of a sentence-

extending parsing usually in a rightward. However, the chart parser can not recognize

fragments of words such as island in a sentence, because it works from the starting point

and parse through one way. Therefore, the island chart parser was introduced for parsing

bidirectional to recognize fragments of words: the parser can parse in both left and right

directions.

The island chart parser parses the syntactic and semantic structure of an input string

with domain-specific rules. The phrase patterns are used as domain-specific rules

(grammar) for the parser. The parser consists of nodes, edges, and a chart. A node is a

point between two words in the input string that is being parsed. One node has

information about all the edges that are coming in or going out from it. An edge is a data

structure that represents a complete or partial parsing of the input. It applies a particular

rule and is labeled by the rule. A chart is a linear list of nodes and retains all edges.

Figure 5-3 shows an edge that has a rule and four nodes in a chart.

The parsing procedure of the island chart parser has three steps. The first step is the

initialization of a chart with edges. The initialization is done by looking up each term in a

given input and retrieving all patterns associated with each term. The second step is

cycling on the edges that are generated from the first step. During this step, each edge is

checked that it can be expanded by satisfying a rule associated with the edge. If so, the

edge becomes a complete edge: the rule associated with the edge is completely satisfied.









Otherwise, it becomes an incomplete edge: the rule associated with the edge is not

completely satisfied. The complete edge also can be used for extending other incomplete

edges inside a pending edge list. The third step is building parse trees, which are

generated from edges parsed completely on the chart. The parser produces one or more

parse trees as an output of parsing, since one phrase can be interpreted in several ways

due to the ambiguity of natural language (Seiffert, 1987). For each parse tree, a semantic

structure is created for representing the meaning of the phrase

Semantic Structure

Semantic interpretation is a process that translates parse trees from outputs of

parsing into structured forms. The structured forms are called semantic structures. The

structure is an object representing a concept that can be automatically classified and

added to the ontology. Each semantic structure 1) contains concepts that are associated

with the crop-pest ontology, 2) is saved as the instance of the HEAD concept of an input

string in the crop-pest ontology, and 3) can be used for retrieving images.

Indexing of images is a process in which important information is extracted from

the image captions based on syntactic and semantic analysis. Semantic structures of each

image caption, outputs of the information extraction system consisting of phrase patterns,

the crop-pest ontology, and the island chart parser, were created based on syntactic and

semantic analysis. They provide a rich description of each image. In addition, they are

indirectly related to concepts associated with each caption. The semantic structure could

be used as indexes of images containing rich information to retrieve images.










chart

dot rule
<.np pest ATTRIBUTE:object, np photograph HEAD.>

edge

node thrips egg photograph
node


Figure 5-3. Components of the island chart parser with a simple input string. This
diagram shows only one edge, but there are many other edges for the phrase
being analyzed. An edge has two dots (left and right), which show how many
symbols in the pattern are applied: the left dot shows how many symbols have
been applied so far in the left direction and the right dot shows how many
symbols have been applied so far the right direction

Results of the Information Extraction System

The information extraction system in the crop-pest domain was tested by 150

phrases from image captions. The island chart parser traversed input strings with the

crop-pest ontology and phrase patterns. One of parse trees is shown in the left side of

Figure 5-4. The parse tree was converted to a semantic structure, as shown in the right

side of Figure 5-4. The semantic structure contains 1) words which are associated with

the crop-pest ontology (bold), 2) semantic relationships among them (italic), and 3)

numbers as an identifier for the synset that represents an appropriate meaning of each

word. The HEAD (underlined) of the phrase "cotton stainer adult in a white cotton

bloom" is the concept "adult." Therefore, the semantic structure was saved as an instance

the concept "Image" in the crop-pest. The instance of "adult" can be used for searching

"adult" directly or related words indirectly.

The result of extracting information was that 130 of 150 phrases were parsed and

converted to semantic structures. The island chart parser could not parse the remaining 20

phrases because of the existence of words that were not associated with phrase patterns.












Those words could be categorized into three groups: 1) 13 phrases containing plural


nouns (even though the system includes phrase patterns which can determine the singular


form), 2) two phrases with conjunctions such as "and," and 3) 5 phrases that included


added parenthetical explanations, such as "wireworm larva (click beetle grubs) on peanut


photograph." The first two categories could be fixed by adding more patterns for


detecting conjunctions and plural nouns. Items in the third category could be successfully


parsed by modifying the parenthetical phrases using synonyms. Usually the information


inside parentheses in the crop-pest domain represents other common names of insect or


scientific names. These names are synonyms of the original names (i.e. wireworm larva =


beetle grub). The crop-pest ontology can present synonyms. These changes, would


improve the information extraction system to parse given all 150 phrases.



. ... ........ adult 46
location: in 296
np pes ATRIBUTE organismnp development stage HEAD plantpart: bloom 82

np insir plant: cotton 74,
cotton n 74 ATTRIBUTE host,staner n 615 HEAD
cotton -cotton color: white 916,
stainer- stainer
np development stage determiner: a 731
np development stage HEAD,pp in plant partATTRIBUTE location
np development stage
nadutelt a n4t6aHe organism: stainer 615
adult n 45 HEAD
adult adult
pp in plant part host: cotton 74
In p 296 HEAD,np plant partATTRIBUTE plant part
in- In
np plant part
a d 731 ATTRIBUTE determiner,np plant part HEAD
a- a
np plant part
adjp colorful ATTRIBUTE.color.np plant part HEAD
adjp colorful
white a 916 HEAD
white white
np plant part
np plantATTRIBUTE plant,np plant part HEAD
np plant
cotton n 74 HEAD
cotton cotton
np plant part
hin-m .1 HFAn




Figure 5-4. A parse tree (left) and the semantic structure (right) based on the parse tree
for the caption "cotton stainer adult in a white cotton bloom.". The numbers
appearing in the right panel indicates that each term was mapped into classes
in the crop-pest ontology


in the crop-pest ontology









Conclusion

An information extraction system was developed with the crop-pest ontology and

phrase patterns as domain-specific knowledge sources. This system provides a semi-

automatic approach to the creation of indexes based on syntactic and semantic analysis of

natural language. The system was then tested with 150 image captions. One hundred and

thirty phrases (86.67%) were parsed and converted to semantic structures successfully.

Phrase patterns were constructed manually for the information extraction system. This

manual building of phrase patterns is very tedious. In future work, automatic building of

phrase pattern from existing phrase patterns will be explored in the information extraction

system.














CHAPTER 6
CONCLUSION AND FUTURE DIRECTIONS

The new approach of retrieving images associated with an ontology is present here.

291 images describing crops and related pests (called "crop-pest domain") were used to

develop the new approach. An ontology called "crop-pest ontology" was built for

retrieving the 291 images, covering concepts in the crop-pest domain. It provides formal

description of concepts in the crop-pest domain and supports reasoning processes based

on the formal structure such as image search. A practical methodology for developing the

crop-pest ontology was suggested according to the principles. Each step of the

development was explained with specific examples. The crop-pest ontology contains 286

classes, 81 object properties, 36 datatype properties, and 305 individuals. The top-level of

the ontology was imported from the Suggested Upper Merged Ontology (SUMO), which

shows a good example of the reusability of existing ontology. The consistency of the

crop-pest ontology was checked using the OWL reasoner (open-source Java based OWL

DL reasoner) as validation of the ontology. The result of the validation showed that

classes, properties and individuals in the crop-pest ontology are logically consistent.

Complete ontology not only can support their intended applications and function

properly, but also can be re-used for the development of other ontologies. Therefore, the

evaluation of the crop-pest ontology is essential process. The crop-pest ontology

evaluation was done using the quantitative approach, testing the coverage of the ontology

with a domain-specific corpus. 138 terms from a domain-specific corpus were tested to

check the coverage of the crop-pest ontology, comparing the well-know agricultural









ontology, AGROVOC. The crop-pest ontology covered 44.93% of tested terms, while

AGROVOC covers 30.43% of them. It indicated the crop-pest ontology coverage is

better than AGROVOC in the domain-specific corpus. Therefore, the crop-pest ontology

can support the ontology-assisted image retrieval in the crop-pest domain.

Jacob's claim about "a controlled vocabulary is itself an ontology" brought a new

research of a practical comparison between the National Agricultural Library Thesaurus

(NALT) and the crop-pest ontology. In this research, two categories both NALT and the

crop-pest ontology were considered to compare. One is representation ability of domain

knowledge and the other is reasoning ability based on the representation. NALT

represents the domain knowledge based on simple relations such as BT and RT, it occurs

ambiguity of relations. In addition, the description of NALT was not written in a formal

language, which means that a reasoner can not reach each component of NALT and not

do any reasoning process. However, the reasoning ability can give the power of

deduction of a new conclusion based on the true statement, since the crop-pest ontology

is written in the formal language. In addition, the reasoning ability supports the search of

information with high precision and recall. Furthermore, it provides the automatic

validation of logical consistency. Therefore, the practical comparison offers the following

conclusion; the crop-pest ontology provides the better representation of domain

knowledge and a grater power of reasoning based on the underlying representation,

which could improve searching technique in the agricultural information system.

Indexing images is a vital importance to support to browse images. Current

technique that indexes images by thesaurus can overlook some information inside each

image that might be a essential to index it. The crop-pest ontology could improve the









deficiency of describing domain knowledge comparing with thesaurus. So, indexing

images associated with the crop-pest ontology was explored in this research. This

indexing process is as followings:

* Manual syntactic and semantic analysis of image caption of each image
* Creating an individual of the image in the crop-pest ontology
* Filling the values of properties on the individual
* Concerning assigned values into classes/individuals in the crop-pest ontology
* Saving the individual into the crop-pest ontology

Index of each image from step 1 through 5 was saved into part of the crop-pest ontology

as an individual. The index can be retrieved for browsing images. Demand of a new

interface to browse images associated with the crop-pest ontology brought the work to

create the new interface. The goal of this interface is to support the browsing images,

avoiding negative consequences like empty result sets or feeling of being lost. In

addition, it provides users to acquire domain knowledge described by the crop-pest

ontology, during browsing concepts or images. The new interface has several features to

support as following:

* To browse images

* To support visualization pf a hierarchical structure of classes in the crop-pest
ontology

* To show properties of the crop-pest ontology

* To show relationships of each image with concepts in the crop-pest ontology

* To find all images related to a particular class

The usability study was done using on-line evaluation. The evaluation compares

the new interface to retrieve images with the crop-pest ontology to a conventional search

interface based on the keyword. A preliminary usability study indicates that participants

met less empty results in the retrieval of images using the crop-pest ontology. Moreover,









it shows that the image retrieval using the crop-pest ontology helps users to find relevant

images by transferring the domain knowledge to them.

An information extraction system was developed with the crop-pest ontology and

phrase patterns as domain-specific knowledge sources. This system provides a semi-

automatic approach to the creation of indexes based on syntactic and semantic analysis of

natural language. The system was then tested with 150 image captions. One hundred and

thirty phrases (86.67%) were parsed and converted to semantic structures successfully.

Phrase patterns were constructed manually for the information extraction system. This

manual building of phrase patterns is very tedious. In future work, automatic building of

phrase pattern from existing phrase patterns will be explored in the information extraction

system.














APPENDIX A
291 IMAGE CAPTIONS

1: photograph of insect pest
2: photograph of pest of agronomic crops
3: photograph of peanut pest
4: Thrips on peanut photograph
5: Rednecked Peanutworm on peanut photograph
6: Leafhoppers on peanut photograph
7: Hepperburn from leafhoppers on peanut photograph
8: Three-cornered Alfalfa Hopper on peanut photograph
9: Lesser Cornstalk Borer on peanut photograph
10: Whitefringed Beetle Grub on peanut photograph
11: Southern Corn Rootworm (Spotted Cucumber Beetle Larva) on peanut photograph
12: Wireworm Larva (Click Beetle grubs) on peanut photograph
13: Corn Earworm on peanut photograph
14: Fall Armyworm on peanut photograph
15: Damage caused by fall armyworm on peanut photograph
16: Stink Bug on peanut photograph
17: Cutworm on peanut photograph
18: Damage caused by cutworm on peanut photograph.
19: Spider Mites on peanut photograph
20: Southern Armyworm on peanut photograph
21: Velvetbean Caterpillar on peanut photograph
22: soybean pest photograph
23: Lesser Cornstalk Borer on soybean photograph
24: Lesser Cornstalk Borer damage on soybean photograph
25: Whitefringed Beetle on soybean photograph
26: Three-cornered Alfalfa Hopper on soybean photograph
27: Three-cornered alfalfa hopper damage on soybean photograph
28: Velvetbean Caterpillar on soybean photograph
29: Looper on soybean photograph
30: Green Cloverworm on soybean photograph
31: Beet Armyworm on soybean photograph
32: Fall Armyworm on soybean photograph
33: Corn Earworm on soybean photograph
34: Stink Bug on soybean photograph
35: Damage by Stink Bug on soybean photograph
36: Bean Leaf Beetle on soybean photograph
37: Soybean Stem Borer on soybean photograph
38: Grasshopper on soybean photograph
39: Yellow-striped Armyworm on soybean photograph









40: Mexican Bean Beetle on soybean photograph
41: Blister Beetle on soybean photograph
42: Snowy Tree Cricket on soybean photograph
43: cotton pest photograph
44: Beet Armyworm on cotton photograph
45: Thrips on cotton photograph
46: Damage by thrips on cotton photograph
47: Tarnished Plant Bug on cotton photograph
48: Damage by tarnished plant bug on cotton photograph.
49: Bollworm on cotton photograph
50: Cotton Aphid on cotton photograph
51: Fall Armyworm on cotton photograph
52: Looper on cotton photograph
53: Cotton Leafworm on cotton photograph
54: Whitefly on cotton photograph
55: Stink Bug on cotton photograph
56: Damage by stink bug on cotton photograph
57: European Corn Borer on cotton photograph
58: Boll Weevil on cotton photograph
59: Cutworm on cotton photograph
60: Spider Mite on cotton photograph
61: Cotton Stainer on cotton photograph
62: White Fringed Beetle on cotton photograph
63: Southern Armyworm on cotton photograph
64: Cotton Fleahopper on cotton photograph
65: Leafminer on cotton photograph
66: Flea Beetle on cotton photograph
67: Sugarcane beetle on cotton photograph
68: Cotton Square Borer on cotton photograph
69: Tobacco Budworm on cotton photograph
70: Thrips damage to peanut leaves.
71: Closeup of adult thrips on peanut leaf.
72: Rednecked peanutworm and damage on peanut.
73: Rednecked peanutworm in peanut bud.
74: Hopperburn caused by leafhoppers on peanut leaves.
75: Closeup on hopperburn caused by leafhoppers on peanut leaf.
76: Overview of hopperburn on peanut caused by leafhoppers.
77: Adventitious root growth on peanut caused by three-cornered alfalfa girdling.
78: Lesser cornstalk borer silken feeding tubes on peanut pegs.
79: Lesser cornstalk borer adult moths (male left, female right).
80: Closeup of lesser cornstalk borer larva on peanut leaf.
81: Whitefringed beetle grub in soil at base of peanut plant.
82: Spotted cucumber beetle (Southern Corn Rootworm adult) on peanut leaf.
83: Southern corn rootworm (Spotted cucumber beetle larva) on peanut peg.
84: Southern corn rootworm (Spotted cucumber beetle larva) damage to peanut pod.
85: Wireworm larva on soil in peanut field.









86: Medium corn earworm larva on edge of peanut leaf.
87: Large corn earworm larva on peanut leaf.
88: Profile of small corn earworm larva on edga of peanut leaf.
89: Fall armyworm egg mass on peanut leaf.
90: Small armyworm larva and minor damage on peanut leaf.
91: Hatching fall armyworm egg mass on peanut.
92: Fall armyworm damage to peanut bud.
93: Stink bug egg mass in peanut leaf.
94: Cutworm damage to peanut pod.
95: Spider mites on peanut leaf.
96: Large southern armyworm larva on peanut leaf.
97: Dark phase of velvetbean caterpillar on peanut stem.
98: Dark phase of velvetbean caterpillar on peanut stem.
99: Lesser cornstalk borer larva with damage on soybean.
100: An adult lesser cornstalk borer moth on the ground beneath a soybean plant.
101: Lesser cornstalk borer larva on soil beneath soybean.
102: Lesser cornstalk borer damage on soybean.
103: Whitefringed beetle grub in soil under soybean plant.
104: Adult whitefringed beetle with damage on soybean.
105: Adult whitefringed beetle on soybean.
106: Three-cornered alfalfa hopper nymph on soybean.
107: Adult three-cornered alfalfa hopper on soybean stem.
108: Damage caused by three-cornered alfalfa hopper on soybean.
109: Small velvetbean caterpillar larvae on soybean.
110: Large velvetbean caterpillar larva on soybean.
111: Looping velvetbean caterpillar larva on edge of soybean leaf.
112: Adult velvetbean caterpillar moth resting on soybean leaf.
113: Velvetbean caterpillar larva on soybean.
114: Adult velvetbean caterpillar moth on soybean.
115: Dark phase of velvetbean caterpillar on soybean.
116: Looper larva on soybean leaf.
117: Large looper larva on soybean leaf.
118: Large looper on soybean leaf.
119: Close-up of large looper larva on soybean leaf.
120: Large green cloverworm larva on soybean leaf.
121: Green cloverworm larva on soybean leaf.
122: Adult green cloverworm moth on soybean.
123: Beet armyworm larva on soybean.
124: Beet armyworm larva curled up on soybean leaf.
125: Adult beet armyworm moth on soybean leaf.
126: Large beet armyworm larva on soybean.
127: Large fall armyworm larva on soybean.
128: Adult fall armyworm moth on soybean.
129: Small corn earworm larva on edge of soybean leaf.
130: Corn earworm larva on soybean foliage.
131: Corn earworm larva on soybean pod.









132: Close-up of adult corn earworm moth on soybean
133: Stink bug nymphs on soybean pod.
134: Stink bug nymphs on dried soybeans.
135: Southern green stink bug on damaged soybean leaf.
136: Close-up southern green stink bug on soybean leaf. Notice the dried soybean laying
on leaf for size comparison.
137: Small stink bug nymphs next to egg mass on soybean leaf.
138: Small stink bug nymphs on egg mass on soybean leaf.
139: Southern green stink bug nymph on soybean leaf.
140: Close-up on black stink bug on soybean. Notice the dried soybean for size
comparison.
141: Stink bug egg masses on soybean leaf.
142: Adult brown stink bug on soybean.
143: Soybean pod damage caused by stink bug.
144: Soybean pod damage caused by stink bug on soybean.
145: Adult bean leaf beetle with damage on soybean.
146: Bean leaf beetle adult with damage on soybean.
147: Soybean stem borer adult on soybean stem.
148: Soybean stem borer larva in damaged soybean stem.
149: Grasshoppers on soybean leaf.
150: Close-up of grasshopper on soybean stem.
151: Large yellow-striped armyworm on soybean.
152: Mexican bean beetle eggs on underside on soybean leaf.
153: Mexican bean beetle larva on soybean.
154: Pupa of mexican bean beetle on soybean leaf.
155: Adult mexican bean beetle on soybean leaf.
156: Adult blister beetle on soybean leaf.
157: Snowy tree cricket on soybean.
158: Healthy and parasitized beet armyworm larvae with feeding damage on cotton.
159: Large beet armyworm larva behind bract of cotton bloom.
160: Hatching beet armyworm egg mass on cotton.
161: Small beet armyworm feeding in cotton leaf.
162: Beet armyworm larvae feeding in cotton bloom.
163: Beet armyworm pupa on soil at base of cotton plant.
164: Thrips on cotyledon leaf of cotton.
165: Thrips damage to the anthers of a cotton flower.
166: Adult and immature western flower thrips on cotton.
167: Western flower thrips on cotton.
168: Tobacco thrips adult on cotton.
169: Cotton seedling damaged by thrips.
170: Tarnished plant bug on cotton bract.
171: Tarnished plant bug nymph on cotton bract.
172: Two-day old tarnished plant bug egg.
173: Six-day old tarnished plant bug egg.
174: Mid-season plant bug damage to white cotton bloom.
175: Damage caused by a tarnished plant bug on a pinhead square.









176: Bollworm egg at the base of a cotton square.
177: Bollworm egg on cotton leaf.
178: Small bollworm on a small cotton square.
179: Small bollworm on terminal of cotton plant.
180: Four day old bollworm on cotton.
181: Five to six day old bollworm on cotton.
182: Small bollworm in white cotton bloom.
183: 4-day old bollworm larva in dried bloom tag with cotton boll damage.
184: Bollworm damage to small cotton boll under bloom tag.
185: Bollworm egg on brown cotton bloom tag.
186: 6-day old bollworm larva in white cotton bloom.
187: 6-day old bollworm on small cotton boll.
188: Small bollworm on cotton square showing damage.
189: Bollworm feeding through the white bloom into the cotton boll.
190: 4- to 5-day old bollworm larva under the cotton bloom tag with boll damage.
191: Old boll tip feeding damage from a bollworm on a large boll.
192: Old boll tip feeding damage from a bollworm on a mature boll.
193: Bollworm adult moth on cotton leaf.
194: Large bollworm larva on cotton stem.
195: Closeup of head and thorax of a bollworm larva.
196: Large bollworm larva under cotton bloom tag.
197: Bollworm egg on dried cotton bloom.
198: Bollworm eggs on cotton leaf.
199: Cotton bollworm larva on boll in Bt cotton.
200: Bollworm egg on brown bloom tag.
201: Bollworm larva feeding in cotton bloom.
202: Bollworm larva under brown bloom tag.
203: Cotton aphids on cotton leaf.
204: Cotton aphid honey dew and cupped cotton leaves.
205: Cotton aphids with fungus disease.
206: Sooty mold on cotton lint caused by aphid honeydew.
207: Four day old fall armyworm larva feeding on boll bract.
208: Large fall armyworm larva behind the bract of white cotton bloom.
209: 3-day old fall armyworm larva on cotton boll bract.
210: Fall armyworm on small cotton square with damage.
211: Fall armyworm damage to cotton boll bract.
212: Small fall armyworm on cotton square in top of plant.
213: Fall armyworm egg mass on cotton leaf.
214: Small fall armyworm in white cotton bloom.
215: Small fall armyworm larva with feeding damage to cract calyx and boll.
216: Small fall armyworm larva with feeding damage on bract.
217: Bract etching by fall armyworm.
218: Fall armworm larva boring into cotton stem.
219: Fall armyworm larva feeding in white bloom.
220: General view of looper feeding damage on cotton.
221: Soybean Looper on cotton leaf.









222: Soybean looper feeding on cotton leaf.
223: Soybean looper pupa on a cotton leaf.
224: Cabbage looper on cotton leaf.
225: Large soybean looper larva on cotton leaf.
226: Large cabbage looper larva on cotton leaf.
227: Soybean looper pupa on cotton leaf.
228: Soybean looper larva on cotton leaf.
229: Cotton leafworm larva with feeding damage on cotton.
230: Cotton leafworm larva and damage.
231: Cotton leafworm larva with damage.
232: Banded-winged whitefly adults and eggs on cotton.
233: Banded-winged whitefly pupae on cotton leaf.
234: Sweet potato whitefly on the underside of a cotton leaf.
235: Closeup of adults and eggs of the sweet potato whitefly.
236: Closeup of sweet potato larvae and pupae on cotton.
237: Silverleaf whitefly on cotton leaf.
238: Southern green stink bug on cotton leaf.
239: Southern green stink bug on cotton boll with feeding damage.
240: Stink bug injury to cotton boll.
241: no description
242: Wart on inside of cotton boll from stink bug injury
243: Outside boll blemish and brown lint from stink bug damage.
244: Boll rot caused by stink bug feeding.
245: Stink bug damaged boll versus normal open boll.
246: Southern green stink bug nymph and damaged boll.
247: Southern green stink bug nymph on cotton leaf.
248: Adult stink bug feeding on cotton boll.
249: Southern green stink bug adult feeding on cotton boll.
250: Stink bug damage to a 4-day old boll.
251: Dissected 4-day old boll showing internal stink bug damage.
252: no image caption
253: Pinned specimens of adult european corn borer moths.
254: European corn borer egg mass on leaf.
255: European corn borer larva in cotton stem.
256: European corn borer larva in cotton boll.
257: European corn borer larva and boll damage.
258: European corn borer damage to cotton bolls.
259: Boll weevil on cotton boll.
260: Boll weevil pupa in cotton square.
261: Cotton square punctured by boll weevil.
262: Cutworm on soil curled in C-shape.
263: Cutworm in soil at base of cotton plant.
264: Cutworm in soil at the base of cotton planted in wheat stubble.
265: Cutworm in soil with damaged cotton plant.
266: Spider mite damage to cotton leaf.
267: Spider mites on underside of cotton leaf.









268: Cotton stainer adult in a white cotton bloom.
269: Cotton stainer nymphs on rotted cotton boll.
270: Cotton stainer nymph on cotton leaf.
271: White fringed beetle grub and damage to cotton seedling.
272: White fringed beetle grub damage to cotton.
273: Closeup of white fringed beetle grub feeding damage to cotton
274: Above ground symptoms of white fringed beetle grub feeding.
275: Southern armyworm egg mass on cotton leaf.
276: Small southern armyworm larvae on cotton leaf with feeding damage.
277: Southern armyworm (early instar) on cotton leaf with damage.
278: Late instar southern armyworm on cotton leaf.
279: Several color variations of late instar southern armyworms.
280: Large southern armyworm larva on cotton leaf.
281: Southern armyworm larva on cotton leaf.
282: Small southern armyworm larvae with damage.
283: Large southern armyworm larvae showing color variations.
284: Cotton fleahopper on cotton leaf.
285: Leafminer damage to cotton leaf.
286: Leafminer damage to cotton.
287: Flea beetle and damage to cotton leaf.
288: Sugarcane beetle larva on boll in Bt cotton.
289: Sugarcane beetle on finger.
290: Cotton square borer larva and damage.
291: Tobacco budworm moth on cotton leaf. possibility of picture frames















APPENDIX B
A LIST OF WORDS APPEARING ON 291 IMAGE CAPTIONS


head 1
calyx 1
silverleaf whitefly 1
internal 1
normal 1
pinhead bollworm 1
parasitized 1
boll cutworm 1
resting 1
tubes 1
above 1
immature 1
field 1
open 1
medium 1
armyworms 1
3-day 1
girdling 1
etching 1
finger 1
planted 1
dew 1
versus 1
stubble 1
laying 1
brown stink bug 1
inside 1
several 1
cupped 1
view 1
soybean.1
foliage 1
masses 1
disease 1
six 1
right) 1
beet 1
pinned 1
dissected 1


wart 1
boring 1
female 1
six-day 1
large yellow-striped armyworm
1
an 1
top 1
tobacco 1
rot 1
five 1
4- 1
flea 1
fall 1
outside 1
black stink bug 1
cornstalk 1
rotted 1
5-day 1
edga 1
caterpillar 1
c-shape 1
minor 1
cract 1
growth 1
plant bug 1
bolls 1
hepperburn 1
aphid 1
velvetbean 1
next 1
specimens 1
anthers 1
adventitious 1
pegs 1
wheat 1
fall armworm 1
blemish 1
sooty 1