<%BANNER%>

Information Retrieval with Concept Discovery in Digital Collections for Agriculture and Natural Resources

Permanent Link: http://ufdc.ufl.edu/UFE0042551/00001

Material Information

Title: Information Retrieval with Concept Discovery in Digital Collections for Agriculture and Natural Resources
Physical Description: 1 online resource (171 p.)
Language: english
Creator: ZIEMBA,LUKASZ W
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2011

Subjects

Subjects / Keywords: COMPUTATIONAL -- EXTRACTION -- INFORMATION -- KNOWLEGDE -- LINGUISTICS -- ONTOLOGY -- ORGANIZATION -- RETRIEVAL
Agricultural and Biological Engineering -- Dissertations, Academic -- UF
Genre: Agricultural and Biological Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The amount and complexity of information available in a digital form is already huge and new information is being produced every day. Retrieving information relevant to address a particular need becomes a significant issue. This work utilizes knowledge organization systems (KOS), such as thesauri and ontologies and applies information extraction (IE) and computational linguistics (CL) techniques to organize, manage and retrieve information stored in digital collections in the agricultural domain. Two real world applications of the approach have been developed and are available and actively used by the public. An ontology is used to manage the Water Conservation Digital Library holding a dynamic collection of various types of digital resources in the domain of urban water conservation in Florida, USA. The ontology based back-end powers a fully operational web interface, available at http://library.conservefloridawater.org. The system has demonstrated numerous benefits of the ontology application, including accurate retrieval of resources, information sharing and reuse, and has proved to effectively facilitate information management. The major difficulty encountered with the approach is that large and dynamic number of concepts makes it difficult to keep the ontology consistent and to accurately catalog resources manually. To address the aforementioned issues, a combination of IE and CL techniques, such as Vector Space Model and probabilistic parsing, with the use of Agricultural Thesaurus were adapted to automatically extract concepts important for each of the texts in the Best Management Practices (BMP) Publication Library ? a collection of documents in the domain of agricultural BMPs in Florida available at http://lyra.ifas.ufl.edu/LIB. A new approach of domain-specific concept discovery with the use of Internet search engine was developed. Initial evaluation of the results indicates significant improvement in precision of information extraction. The approach presented in this work focuses on problems unique to agriculture and natural resources domain, such as domain specific concepts and vocabularies, but should be applicable to any collection of texts in digital format. It may be of potential interest for anyone who needs to effectively manage a collection of digital resources.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by LUKASZ W ZIEMBA.
Thesis: Thesis (Ph.D.)--University of Florida, 2011.
Local: Adviser: Beck, Howard W.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2011
System ID: UFE0042551:00001

Permanent Link: http://ufdc.ufl.edu/UFE0042551/00001

Material Information

Title: Information Retrieval with Concept Discovery in Digital Collections for Agriculture and Natural Resources
Physical Description: 1 online resource (171 p.)
Language: english
Creator: ZIEMBA,LUKASZ W
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2011

Subjects

Subjects / Keywords: COMPUTATIONAL -- EXTRACTION -- INFORMATION -- KNOWLEGDE -- LINGUISTICS -- ONTOLOGY -- ORGANIZATION -- RETRIEVAL
Agricultural and Biological Engineering -- Dissertations, Academic -- UF
Genre: Agricultural and Biological Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The amount and complexity of information available in a digital form is already huge and new information is being produced every day. Retrieving information relevant to address a particular need becomes a significant issue. This work utilizes knowledge organization systems (KOS), such as thesauri and ontologies and applies information extraction (IE) and computational linguistics (CL) techniques to organize, manage and retrieve information stored in digital collections in the agricultural domain. Two real world applications of the approach have been developed and are available and actively used by the public. An ontology is used to manage the Water Conservation Digital Library holding a dynamic collection of various types of digital resources in the domain of urban water conservation in Florida, USA. The ontology based back-end powers a fully operational web interface, available at http://library.conservefloridawater.org. The system has demonstrated numerous benefits of the ontology application, including accurate retrieval of resources, information sharing and reuse, and has proved to effectively facilitate information management. The major difficulty encountered with the approach is that large and dynamic number of concepts makes it difficult to keep the ontology consistent and to accurately catalog resources manually. To address the aforementioned issues, a combination of IE and CL techniques, such as Vector Space Model and probabilistic parsing, with the use of Agricultural Thesaurus were adapted to automatically extract concepts important for each of the texts in the Best Management Practices (BMP) Publication Library ? a collection of documents in the domain of agricultural BMPs in Florida available at http://lyra.ifas.ufl.edu/LIB. A new approach of domain-specific concept discovery with the use of Internet search engine was developed. Initial evaluation of the results indicates significant improvement in precision of information extraction. The approach presented in this work focuses on problems unique to agriculture and natural resources domain, such as domain specific concepts and vocabularies, but should be applicable to any collection of texts in digital format. It may be of potential interest for anyone who needs to effectively manage a collection of digital resources.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by LUKASZ W ZIEMBA.
Thesis: Thesis (Ph.D.)--University of Florida, 2011.
Local: Adviser: Beck, Howard W.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2011
System ID: UFE0042551:00001


This item has the following downloads:


Full Text

PAGE 1

1 INFORMATION RETRIEVAL WITH CONCEPT DISCOVERY IN DIGITAL COLLECTIONS FOR AGRICULTURE AND NATURAL RESOURCES By LUKASZ ZIEMBA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2011

PAGE 2

2 201 1 Lukasz Ziemba

PAGE 3

3 To ZZ

PAGE 4

4 ACKNOWLEDGMENTS I would like to thank my advisor Howard Beck for his constant support and encouragement over the past five years. I co uld not have achieved this goal without his patience, guidance, and persis tent motivation. For providing i nnumerable helpful comments and helping to guide this research, I also thank all the members of my graduate committee: Antonio Arroyo, Michael Dukes, Dorota Haman, Laurie Taylor and Fedro Zazueta. I would also like to thank Conserve Florida Water Clearinghouse and Best Management Practic es Group at Institute of Food and Agricultural Services at UF for sponsoring this project. Special thanks to my wife Zuzanna for her invaluable support during this Ph D journey. To all my friends in the Agricultural and Biological Engineering and Envir onmental Engineering Sciences Department s at UF: thank you for making this department the g reatest work environment ever. Last, but not least, I would like to thank my parents for their understanding and support

PAGE 5

5 TABLE OF CONTENTS P age ACKNOWLEDGMENTS ................................ ................................ ................................ .. 4 LIST OF TABLES ................................ ................................ ................................ ............ 7 LIST OF FIGURES ................................ ................................ ................................ .......... 8 LIST OF ABBREV IATIONS ................................ ................................ ........................... 10 ABSTRACT ................................ ................................ ................................ ................... 11 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .... 13 Knowledge Organization Systems ................................ ................................ .......... 14 Thesaurus ................................ ................................ ................................ ........ 15 Ontology ................................ ................................ ................................ ........... 17 Information Extraction ................................ ................................ ............................. 19 Comp utational Linguistics ................................ ................................ ....................... 21 Tokenization ................................ ................................ ................................ ..... 23 Stop Words ................................ ................................ ................................ ....... 24 Stemming and Lemmatization ................................ ................................ .......... 25 Parsing ................................ ................................ ................................ ............. 26 Applications ................................ ................................ ................................ ............ 27 Contribution ................................ ................................ ................................ ............ 28 2 RELATED WORK ................................ ................................ ................................ ... 32 Agriculture and Natural Resources Applications ................................ ..................... 32 Other Work in IE and CL ................................ ................................ ......................... 36 VIVO ................................ ................................ ................................ ....................... 38 Summary ................................ ................................ ................................ ................ 41 3 ONTOLOGY APPLICATION METHODOLOGY ................................ ...................... 43 Ontology Development ................................ ................................ ........................... 43 Specification ................................ ................................ ................................ ..... 44 Conceptualization and Formalization ................................ ............................... 44 Implementation and Maintenance ................................ ................................ .... 45 Knowledge Acquisition ................................ ................................ ..................... 45

PAGE 6

6 Ontology Application ................................ ................................ ............................... 46 Top L evel Ontology ................................ ................................ .......................... 47 Domain Specific Ontology ................................ ................................ ................ 47 Onto logy Assisted Search ................................ ................................ ................ 48 System Architecture ................................ ................................ ......................... 49 4 INFORMATION EXTRACTION METHODOLOGY ................................ ................. 54 Initial Construction ................................ ................................ ................................ .. 54 Title Parsing With the Stanford Parser ................................ ................................ .... 55 Frequency Analysis ................................ ................................ ................................ 57 Concept Extraction ................................ ................................ ........................... 57 Semantic Similarity ................................ ................................ ........................... 58 Document C oncept Relevance ................................ ................................ ......... 60 5 RESULTS AND DISCUSSION ................................ ................................ ............... 65 Water Conservation Ontology ................................ ................................ ................. 65 Ontology A ssisted Search ................................ ................................ ...................... 66 Titl e Parsing ................................ ................................ ................................ ............ 69 Frequency Analysis ................................ ................................ ................................ 71 6 CONCLUSIONS AND FUTURE WORK ................................ ................................ 88 Conclusions ................................ ................................ ................................ ............ 88 Future Work ................................ ................................ ................................ ............ 90 APPENDIX A TITLE PARSING RESULTS WATER CONSERVATION DIGITAL LIBRARY ...... 92 B TITLE PARSING RESULTS BMP PUBLICATION LIBRARY ............................. 122 C FREQUENCY ANALYSIS RESULTS BMP PUBLICATION LIBRARY .............. 156 LIST OF REFERENCES ................................ ................................ ............................. 164 BIOGRAPHICAL SKETCH ................................ ................................ .......................... 171

PAGE 7

7 LIST OF TABLES Table P age 1 1 Agricultural thesauri in English language ................................ ............................ 29 3 1 ................................ ............. 50 3 2 ................................ ..... 50 4 1 Summary of concept extraction ................................ ................................ .......... 62 4 2 Definition of document by concept binary matrix ................................ ................ 62 4 3 Example of document by concept binary matrix ................................ ................. 63 4 4 Conce pt by concept similarity matrix ................................ ................................ .. 63 4 5 Document by concept frequency matrix ................................ ............................. 63 4 6 TF IDF matrix ................................ ................................ ................................ ..... 63 4 7 Vote adjusted frequency matrix ................................ ................................ .......... 63 4 8 Vote adjusted TF IDF matrix ................................ ................................ .............. 64 5 1 ................................ 7 6 5 2 ................................ .......... 77 5 3 Ten most frequent concepts extracted from two corpora ................................ .... 78 5 4 Concepts similar to citrus obtained using 2 corpora and 2 measures ................. 79 5 5 Concepts similar to ni trogen obtained using 2 corpora and 2 measures ............ 79 5 6 Concepts similar to irrigation obtained using 2 corpora and 2 measures ........... 80

PAGE 8

8 LIST OF FIGURES Figure P age 1 1 P rehistoric petroglythic imagery from Western U.S. ................................ ........... 29 1 2 ................................ ................................ ................ 30 1 3 Information e xtractio n: from natural language to KOS ................................ ........ 30 1 4 Co ak thereof one ................................ ................................ ................................ .... 30 1 5 A 2 dimensional space with vectors one = (1, 1) and speak = (0, 1) .................. 31 2 1 Visualization using a VIV O co author network ................................ .................... 42 3 1 ObjectEdi tor authoring tool ................................ ................................ ................. 51 3 2 Book individual in OWL ................................ ................................ ...................... 51 3 3 Creating an individual of organization class ................................ ....................... 52 3 4 Domain specific ontology concept in the web interface ................................ ...... 52 3 5 Ontology assisted search results ................................ ................................ ........ 53 3 6 System architecture diagram ................................ ................................ .............. 53 4 1 Capturing Water with Rain Barrels .................... 64 5 1 Fragment of the top level part of the WCDL ontology ................................ ......... 80 5 2 Fragment of the domain specific part of the WCDL ontology ............................. 81 5 3 Web page corresponding to the ontol ogy concept shown in Figure 5 2 ............. 81 5 4 Publicatio n individual with relationships ................................ .............................. 82 5 5 Web page corresponding to publication individual shown in Figure 5 4 ............. 82 5 6 Fragment of the WCDL ontology in ObjectEditor ................................ ................ 83 5 7 Fragment of the simulation ontology in LyraBrowser ................................ .......... 84 5 8 Advanced Guidelines for Preparing Water Conservation Plans ................................ ................................ ........................... 84

PAGE 9

9 5 9 Herbicide Transport in a Restored Riparian Forest Buffer System ................................ ................................ ......................... 85 5 10 Summary of concept e xtraction using Stanford Parser ................................ ....... 85 5 11 Zipf's law for Reuters Corpus Volume 1 ................................ ............................. 86 5 12 Frequency distribution of concepts in the BMP Publication Library .................... 86 5 13 Fragment of NAL Thesaurus ................................ ................................ .............. 87 5 14 Different similarity measures and corpora against NAL distance ........................ 87

PAGE 10

10 LIST OF ABBREVIATION S BMP Best Management Pract ice CL c omputational l inguistics IE i nformation e xtraction IR i nformation r etrieval KOS k nowledge o rganization s ystem NAL National Agricultural Library NLP natural language p rocessing TF IDF Term Frequency Inverse Document Frequency VSM Vector Space Model WCDL Water Conservation Digital Library

PAGE 11

11 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for th e Degree of Doctor of Philosophy INFORMATION RETRIEVAL WITH CONCEPT DISCOVERY IN DIGITAL COLLECTIONS FOR AGRICULTURE AND NATURAL RESOURCES By Lukasz Ziemba May 2011 Chair: Howard Beck Major: Agricultural and Biological Engineering The amount and complexity of information available in a digital form is already huge and new information is being produced every day Retrieving information relevant to address a particular need becomes a significant issue This work utilizes k nowledge o rg anization s ystems (KOS), such as thesauri and ontologies and applies i nformation e xtraction (IE) and c omputational l inguistics (CL) techniques to organize, manage and retrieve information stored in digital collections in the agricultural domain. Two real w orld applications of the approach have been developed and are available and actively used by the public An o ntology is used to manage the Water Conservation Digital Library holding a dynamic collection of various types of digital resources in the domain of urban water conservation in Florida, USA. The ontology based back end powers a fully operational web interface, avail able at http://library.conservefloridawater.org The system has demonstrated numerous benefits of the ontology application, including accurate retrieval of resources, information sharing and reuse, an d has proved to effectively facilitate information management. The major difficulty encountered with the approach is that

PAGE 12

12 l arge and dynamic number of concepts makes it difficult to keep the ontology consistent and to accurately catalog resources manually To address the aforementioned issues, a combination of IE and CL techniques such as Vector Space Model and probabilistic parsing, with the use of Agricultural Thesaurus were adapted to automatically extract concepts important for each of the texts in the Best Management Practices (BMP ) Publication Library a collection of documents in the domain of agricultural BMPs in Florida available at http://lyra.ifas.ufl.edu/LIB A new approach of domain specific concept discovery with the use of Internet search engine was developed Initial evaluation of the results indicates significant improvement in precision of information extraction The approach presented in this work focus es on problems unique to agriculture and natural resource s domain such as domain spe cific concepts and vocabularies but should be applicable to any collection of texts in digital format It may be of potential interest for anyone who needs to effectively m anage a collection of digital resources.

PAGE 13

13 CHAPTER 1 INTRODUCTION Information Technology (IT) was born with first information carriers speech and simple drawings, such as petroglyths shown in Fig ure 1 1. The rapid advancement of IT as we know it today started with the invention of microprocessor in 1971. At some point IT became an irreplaceable part of human societies not only serving but shaping them. As sta ted by Thomas Jefferson Information is a currency By providing access to IT carriers, information sources became more dispersed and enable d individuals to produce and share information with others. Nowadays IT is present in almost every aspect of our life, from driving directions, shopping, reviewing latest news, checking the weather, to finding a technical report. Each of these activities involves some kind of information retrieval (IR) Checking the weather is easy: just type urrent conditions, forecast, etc. Finding a technical report may be more involved and depends on how much can be specified about what is needed about the information need If the title of the report is known, then the task should be pretty straightforwa rd; if only the author is known, then it might require identifying the person and then browsing through the publications he or she authored. Things get more complicated when a xt search engines, such as the u biquitous Google may return thousands of results, many of which are not relevant to the information need. Moreover, the results lack any structure: a technical report can be shown next to an experimental data set, a culina ry recipe, a personal blog or a page of some institution. The amount and complexity of information available in a digital form is already huge and o ngoing developments in the field of

PAGE 14

14 agriculture and natural resources result in increasing amounts of inform ation being generated by the university system, state agencies, consulting agencies and other organizations. A plethora of information types has been produced, such as technical reports and other types of publications, experimental data and other data sets decision support systems, simulation models and more. M aking information av ailable in an organized fashion is the research area known as k nowledge o rganization Knowledge Organization Systems A knowledge o rganization s ystem (KOS) is a scheme for organizing information I t is used to manage a collection and organize materials for the purpose of retrieval (Hodge 2000). An important aspect of KOS that facilitates IR is vocabulary control. Vocabulary control is a method to deal with ambiguity of natural language when describing things (concepts ). Major problems of traditional text search ( with uncontrolled terms ) result from variations in search queries and from differing conceptualizations of the information need ( Tudhope & Nielsen 2006 ). People often use different words to describe the same concept or have different concepts in mind when using the same (ambiguous) words. Controlled vocabularies consist of terms selected by domain experts and knowledge engineers to represent a concept They attemp t to address the ambiguity of natural language by defining the scope of terms and often providing a set of synonyms for each concept. Concepts may be further organized in KOS with hierarchies and other, more complex relationship structures These structur es are then used to: describe the resources in the collection (this process is often referred to as cataloging); specify the information need of the user; and

PAGE 15

15 match the information need with relevant resources. As explained above, a KOS provides a link bet ween the information need (the user) and the material ( the collection ) D ifferent kinds of KOS have different degrees of vocabulary control, richness of relationships or formality levels. Examples of KOS, ordered by increasing complexity, include: authority files glossaries, taxonomies, thesauri, and ontologies. A uthority files are a list of names (such as geographical or species names), a glossary provides definitions for its terms, a taxonomy introduces hierarchical relationships a thesaurus pro vides equivalence and simple associative relationships and an ontology can represent concepts with complex relationships Two KOS that this work focuses on are thesauri and ontologies. Thesaurus A thesaurus is a collection of terms with a small, standard set of relationships developed primarily for the purpose of indexing and retrieval ( Soergel et al 2004) Thesauri define three types of relationships among the terms: hierarchical, equivalence and associative. Hierarchical relationships are indicated by b roader (more general) and narrower (more specific) term designations in the thesaurus. Equivalence relationships are used when two or more terms represent the same concept, such as synonymous terms, common names of organisms and their scientific equivalent spelling variants, usage variants, and acronyms ( NAL 2011 a ) Associative relationships are designated neither hierarchical or equivalence. Associative relationships poin t to other related concepts in the thesaurus that may be of interest. Thesauri distinguish between descriptors and non descriptors, often referred to as preferred terms and non preferred terms, respectively A descriptor uniquely and

PAGE 16

16 unambiguously represents a concept and only a descriptor should be used when referring to the concept. A non descriptor is linked through an e quivalence relationship to the corresponding descriptor that must be used instead. There are no relationships from one non descr iptor to another. Currently, a number of generic, such as the Library of Congress Authorities and Vocabularies ( Library of Congress 2011), and domain specific thesauri are available online. The major agricultural thesauri in English language are: National Agricultural Library (NAL) Thesaurus ( NAL 2011b ) AGROVOC Thesaurus (FAO 2011a) and CAB Thesaurus ( CAB I 2010) The total number of terms ranges from 40 000 in AGROVOC to 9 8 000 in CAB; a brief comparison of the three thesauri is presented in Tab le 1 1 W hen working with thesauri one must be aware of their limitations. A t hesaurus provides a very limited and informal set of relationships between concepts and lacks distinction between concepts (meanings) and their lexicalizations (words) ( Soergel et al 200 4 ) These shortcomings may lead to significant inconsistencies within a single thesaurus and between various the sauri. As noted by (Bartol 2009) the subject scope of many terms can be vague and is frequently not clarified to a sufficient detail. Same terms that stand for a narrower or broader concept in one thesaurus will, in another, stand for a related term or be referred to as non descriptor. The choice of descriptors and related terms can be arbitrary, is often inconsistent and incomplete. In general, t he agricultural thesauri are too ambiguous and inconsistent to be used as the only criterion for knowledge representation, but can still serve as a g ood reference tool (Bartol 2009) Given their more precise and unambiguous semantics, ontologies allow for more

PAGE 17

17 complex and rich representation of domain knowledge, which could benefit IR (Kim & Beck 2006) Ontology o ntology comes (being ) and ( knowledge ) and a science ( or study ) of being It has been used for many centuries in philosophy and metaphysics ( Corazzon 2009). With the advancement of computer technology, ontologies have been adopted to represent and organize information in the fields of k nowledge r epresentation, l ibrary s cience IR n atural l an guage p ro cessing or Internet search engines ( Chandrasekaran et al 1999). The most concise and widely used definition of an ontology as used in computer a (Gruber 19 93). The definition can be expanded as a formal representation of a body of knowledge formed by a collection of concepts and their relationships describing a particular domain (Gruber 2009) Compared to a thesaurus, an ontology provides means for much richer (and t herefore more precise and unambiguous) representation of concepts with their relationships. In an ontology, concepts are represented by individuals, classes and properties. Individual Class represents a set of individuals that belong together according to their Property represents a relationship either between individuals or between individuals and data values, for example:

PAGE 18

18 Citrus Irrigation Management A property restriction is a characterist ic of a class, meaning that all individuals of a particular class are required to have certain properties with certain value types F or one property is a set of individuals to which the property is applied. A range of a property is a set of individuals that the property has as its value. Similarly to hierarchical relationships of a thesaurus, classes may have subclasses (more specific classes) and superclasses (more general cla s ses), for example a class meet property restrictions of its superclass, for example all must have at least 1 Hierarchical relationships in an ontology are also referred to as taxonomical, or vertical relationships, while non h ierarchical ones are sometimes called horizontal relationships. Furthermore, the term hyponym is us ed when referring to more specific relationships and hypernym for more general or is a relationships An important feature of ontologies is that they describe knowledge in a way that is readable for machines (computers). This characteristic enable s knowledge sharing and reuse information resources can be communicated between either human s or computer software For these purposes Web Ontology Language (OWL) (Dean et al. 2004) has been developed. OWL is a n XML based semantic markup language for publi shing and sharing ontologies It was designed to be processed by computer applications, and not meant to be presented to humans. OWL is an official World Wide Web Consortium (W3C) Recommendation, which means that after multi stage

PAGE 19

19 development, review and t esting, it is a standard recommended for wide deployment Figure 1 2 presents an example of a n individual with three properties in OWL isAuthorOf prop erty that relates the individual with a n individual of During creation or maintenance of an ontology ( or other kind of KOS) especially when a new ontology is being developed from scratch the number of individuals classes and properties to be processed is typically huge ( Caracciolo et al. 2007 ). In many cases the amount of information that need s to be checked to perform such operation becomes unmanageable by means of a standard manual inspection which is one of the reason s to resort to some kind of automation of the process Techniques that serve the above purpose are studied in the field of i nformation e xtraction. Information Extraction Information e xtraction (IE) may be defined as the activity of populating a structured information scheme ( such as KOS ) from an unstructured text information source ( Gaizauskas & Wilks 1998) It involves identification of instances of a particular class in a natural language text, and creation of a structured representation of the information drawn from the text (Gri shman 1997). The major problem that IE techniques have to address is capturing the meaning of words but th is meaning is expressed in natural language using ambiguous linguistic structures often implicitly as shown in Fig ure 1 3 The meaning of words is not determined by abstract dictionary definitions I t epresentations of world entities and it is reflected in the way the words are used in natura l language (Lenci 200 8 ) W ords that are used in similar contexts tend to have similar meaning ( Harris 1954) t his idea is known as the

PAGE 20

20 distributional hypothesis Moreover, in later stages of their development, children learn to use certain words (especially abstract ones) from context, before encountering the corresponding world entities ( Firth 1957 ) Based on the distributional hypothesis a number o f approaches were developed including pattern based extraction (Hearst 1992) association measures (Lenci 2008) and Vector Space Models (VSMs) ( Sahlgren 2006 Guo 2008), that perform statistical analysis to identify significant word co occurren ce patterns Moreover, the use of the information available on the Web, such as page counts or text snippets returned by a Web search can further support the statistical methods ( Bollegala 2007 ) In V SM the meaning of a word is represented as a vector whose coordinates are occurrence with other words. Formally, a V SM is defi ned as the quadruple ( Lenci 2008 ) where: T is the set of targets usually words to which the spac e provides a semantic representation, B is the basis that defines the space dimensions linguistic contexts used to compute the distributional similarity, M is a co occurrence matrix with co occurrence frequency of targets with the basis contexts, S is so me measure of distance between points in space, such as cosine or Euclidean distance. Targets are the words whose relationships are going to be examined and basis contexts are all words in the text. The co occurrence matrix contains the target words in row s and contexts in columns. The matrix values are frequencies of co occurrence of the words in a given row and column. The co occurrences are typically counted within a context window spanning a certain number of words ( Sahlgren 2006 ) For this purpose

PAGE 21

21 the used, which means a subsequence of n words. An example of a co occurrence matrix is shown in Fig ure 1 4 The matrix is based on the sentence Whereof one cannot speak thereof one must be silent the targets are all words in the sente nce, the basis (contexts) all words in the sentence as well, the co occurring words the ones adjacent to each other In order to clearly show the example V SM graphically, two context words are shown in Figure 1 5. The sema ntic similarity between words can be approximated with vector similarity/distance measures in V SM space. Commonly used measures ar e Euclidean distance and cosine Computational Linguistics Computational l inguistics (CL) is a term often used interchangeably with n atural l anguage p rocessing (NLP) that denotes a discipline between linguistics and computer science, concerned with computational aspects of the human language ( Radev 2001 ). naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human like language processing for a range of ta ). The idea of f understanding or generating language people utilize various types of language processing, referred to as CL approach, in which several levels of linguistic a nalysis can be distinguished: phonological, morphological, lexical, syntactic, semantic, discourse and pragmatic.

PAGE 22

22 Phonological level is the lowest level of language processing at which speech sounds that constitute words are interpreted. It is only utilize d in CL systems that deal with spoken input or output. Morphological level studies how morphemes the smallest units of meaning can combine in a given language to form words. Words can be broken down into their constituent morphemes in order to learn a bout their meaning. For example, the suffix ed at the end of a verb, indicates that the action of the verb took place in the past. Lexical level interprets the meaning of individual words. Depending on the particular approach taken by the CL system, spec ific techniques are applied to find the meaning of a word, which may require various types of resources. For example with the help of the lexicon all possible part of speech tags may be assigned to each word. Syntactic level looks at the relationships be tween words in a sentence to discover its structure. In most languages syntax carries meaning because the order of words influences the meanin The grass feeds the river. The river feeds the grass. illustrate how syntac tic differences may change meaning the only difference is syntactic the order of the words but they have different meaning. Semantic level is focused on determining the meaning by looking at a wider context. It examines various possible meanings by looking at the interactions among the meanings of words in the sentence In probabilistic systems, the most probable meaning in the given context is determined for words with multiple meanings a process referred to as semantic disambiguation. For example

PAGE 23

23 wider context is used to choose the right sense the disambiguation is performed at the semantic level. Discourse level looks at longer po rtions of text that contain multiple sentences and attempts to make connections between them. Common techniques used at this level are anaphora resolution and text structure recognition. Anaphora resolution is the problem of resolving what a pronoun, or a noun phrase refers to. Text structure recognition determines the functions of sentences in the text, which, in turn, adds to the meaningful representation of the text. For example, newspaper articles can be deconstructed into discourse components such as: l ead, m ain s tory, p revious e vents, e valuation, a ttributed q uotes, and e xpectation (Liddy et al 1993). Pragmatic level analyzes overall intent or purpose. It is a high level use of language that incorporates information based on but not necessarily included in the text directly. It utilizes world knowledge and context beyond the text. CL uses computers to process written and spoken language for some practical purpose such as to translate between languages, to retrieve information from web pages, to c ommunicate with machines, etc. The steps involved in CL analysis for information extraction from text include: tokenization, stop words removal, stemming and lemmatization, parsing. Tokenization The initial, most basic step in CL analysis is tokenization It is the process of dividing the input text into pieces (tokens) (Manning & Schtze 1999). In CL the tokens are words and other symbols like numbers, punctuation marks, etc. It is not always clear how to define what a word is and there are many approach es how to deal with this issue in practice. Kucera and Francis define a graphic word as "a string of continuous

PAGE 24

24 alphanumeric characters with space on either side; may include hyphens and apostrophes, but no other punctuation marks" (Kucera & Francis 1967). There are many cases where the above definition does not work, for example prices, like $42. Tokenization is language specific the language of the input text needs to be known. There are many issues that make tokenization of English language texts a com plex task: whitespace sometimes appears within desired tokens, for example multipart names like Los Angeles, phone numbers 873 4578, words like data base should be regarded as a single token, although they contain a whitespace symbol; hyphenation is used for various purposes, for example co worker, e mail, 4 year old. It is often unclear whether the hyphened word should be treated as one or many tokens, like in the case of 4 year old; periods can denote ends of sentences or abbreviatio ns and there are cases when one period performs both of these functions (haplology). They also appear in filenames, email addresses, URL s, etc.; apostrophes can be used in: contractions: he'll don't that can be treated as one or two tokens, possessive: c litic 's : today's meal, plural: boys' toys, opening and closing quotations; combinations of two or more of the above issues, for example the combination of whitespace and hyphenation makes the itinerary New York Los Angeles particularly difficult to toke nize correctly. Stop W ords Stop words are the words removed from the input text during the analysis. They are the most common words in a particular language that carry least information fo r example a, an, the, to. Many IR systems treat stop words as havi ng little value in helping to select documents matching user's needs (Manning & Schtze 1999) and exclude them from the vocabulary entirely in order to save resources. This is especially true with older systems that used lists of 200 300 stop words, while in newer systems

PAGE 25

25 these lists tend to be much shorter (7 12 words) (Manning et al. 2008). Many modern IR systems do not utilize this technique at all because of potential loss of information. With Stemming and Lemmatizati on The expressiveness of natural language manifests itself in the phenomenon that the same idea can be expressed in different words or in various forms of the same word. The goal of stemming is to capture the relationships between different variations of a word. More precisely, stemming tries to find a common stem of the different forms of a word that occur because of inflection (e.g., plurals, tenses) or derivation (e.g., transforming a verb to a noun by adding a suffix) (Croft et al. 2009). Cambridge dict ionary defines the stem as the part of a word that is left after taking off the part which changes when forming a plural, past tense (Cambridge Advanced Learners Dictionary 2003). A lot of stemming algorithms have been developed, with the Porter stemmer (P orter 1980 Porter 2006 ) being the most popular. Other approaches, such as Krovetz stemmer (Krovetz 1993) enhance the stemming algorithm with the use of a dictionary. Lemmatization is generally defined as the transformation of all inflected word forms con tained in a text to their dictionary look up form (lemma) (Boot 1980). Webster dictionary defines the lemma as a word or phrase that is glossed or a headword (Random House Webster's Electronic Dictionary 1992). Stemming usually refers to a simpler process of removing of the ends of words in the hope of performing it correctly most of the time. Lemmatization usually refers to a similar procedure, but with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endi ngs only an d to return the lemma (Manning et al. 2008).

PAGE 26

26 Studies on large numbers of English language queries show that stemming does not improve performance of IR systems at an aggregate level (Salton 1989, Hull 1996) and full morphological analysis produ ces at most very modest benefits for retrieval (Manning et al. 2008). These techniques can give significant benefits for some queries, while hurting performance of others; they generally improve recall and harm precision. English language has relatively li ttle morphology, and the benefits of stemming and lemmatization techniques can vary in languages with more complex systems of inflection and derivation. Parsing To parse means to analyze (a sentence) in terms of grammatical constituents, identifying the p arts of speech, syntactic relations, etc. (Random House Webster's Electronic Dictionary 1992). A natural language parser is a computer program that identifies grammatical structures in input text, for example, which words go together as phrases and which w ords are the subject or object of a verb. Probabilistic parsers use knowledge of language gained from hand parsed sentences to try to produce the most likely analysis of new sentences ( Stanford NLP Group 2010 ). The ambiguity and open character of natural languages suggest that characterizations of a natural language should not only be based on theoretical linguistic knowledge but also on what is actually found in registrations of language use (Bunt & Nijholt 2000). The most commonly available and used regi strations of natural language use are corpora. Corpus is a collection of a single writer's work or of writing about a particular subject, or a large amount of written and sometimes spoken material collected to show the state of a language (Cambridge Advanc ed Learners Dictionary 2003). The availability of sufficiently large and rich corpora, especially the syntactically

PAGE 27

27 tagged ones, led to the development of powerful probabilistic grammars and parsers CL tools based on the frequency of occurrence of lingui stic structures. The Stanford Parser ( Klein & Manning 2003 Stanford NLP Group 2010 ) is a Java implementation of a probabi listic natural language parser The package provides an accurate probabilistic context free grammar (PCFG) parser. PCFG is a context free grammar in which each production rule is augmented with a probability ( Manning & Schtze 1999 ) The probability of a parse is the product of the probabilities of the productions used in that parse. A PCFG consists of: a set of terminals (English words e.g. capturing water ) a set of nonterminals (Penn Treebank tags, e.g. S VP NP ) a start symbol ( S tag in Penn Treebank) a set of productions (rules) that for each nonterminal assign a sequence of terminals or nonterminals (e.g. NNP water ) a set of probabilities on rules (e.g. P( NNP water ) = 0.1), such that the sum of probabilities of rules for each antecedent nonterminal is 1. Applications A selection of techniques described in previous sections of this chapter was implemented in two real world applications. The Water Conservation Digital Library ( CFWC 2010 ) is a dynamic collection of various types of digital resources in the domain of urba n water conservation in Florida An o ntology was developed to manage the collection and facilitate information retrieval (Ziemba et al. 2011) The Best Management Practices (BMP ) Publication Library (IFAS BMP 2010) is a collection of documents in the doma in of agricultural BMPs in Florida. A combination of various IE and CL techniques, such as stop words, stemming, probabilistic parsing and

PAGE 28

28 VSM with the use of NAL Thesaurus and Stanford Parser were adapted to automatically extract concepts important for ea ch of the texts in the collection. Contribution Within this work new approaches and techniques developed in the fields of k nowledge o rganization, i nformation e xtraction and c omputational l inguistics were adopted and applied to the field of ag riculture and natural resources Th e major contribution to th is field and general body of science are summarized as follows: An o ntology KOS was adapted to address the i nformation r etrieval problem in the Water Conservation Digital Library ; An ontology de scribing the domain of urban water conservation in Florida was created with the help of Dr. Camilo Cornejo who was responsible for knowledge engineer ing ; A combination of statistical and CL techniques including Vector Space Model and probabilistic parsin g, with the use of Agricultural Thesaurus were adapted to address i nformation e xtraction problem in the BMP Document Library ; A new approach of domain specific concept discovery with the use of Internet search engine that improves precision of information extraction was developed ; and A new web based interface was developed that allows for accessing and authoring the information in the Water Conservation Digital Library and BMP Document Library, the web sites a re available and actively used by the public since 2008 and 2010 respectively. The rest of this work is organized as follows: Chapter 2 presents a review of related work, Chapter 3 presents details of the development and application of an ontology to the W ater Conservation Digital Library, Chapter 4 presents the methodology of information extraction in the BMP Publication Library Chapter 5 discusses the results of this work and Chapter 6 presents the conclusions and directions for future work

PAGE 29

29 Tab le 1 1 Agricultur al thesauri in English language ( sources: NAL 2011c FAO 2011b CABI 2010) Thesaurus Website Descriptors / Preferred terms Non descriptors / Non preferred terms NAL http://agclass.canr.msu.edu/agt/a gt.shtml 4 8,3 00 33,8 00 AGROVOC http://aims.fao.org/website/AGR OVOC Thesaurus 31,9 00 8,5 00 CAB http://www.cabi.org/cabthesaurus 6 6, 000 32, 000 F ig ure 1 1 P rehistoric petroglythic imagery from Western U.S. (source: http://www.tcf.ua.edu/Classes/Jbutler/T389/ITHistoryOutline.htm )

PAGE 30

30 F ig ure 1 2 Fig ure 1 3 Information e xtractio n : from natural language to KOS Fig ure 1 4 Co occurrence matrix Whereof one cannot speak thereof one must be silent Sahlgren 2006 ) KOS formal structure explicit knowledge i nformation e xtraction < person rdf:ID=" JohnSmith "> < firstName rdf:datatype="http://www.w3.org/2001/ XMLSchema#string"> John < lastName rdf:datatype= "http://www.w3.org/2001/ XMLSchema#string"> Smith natural l anguage ambiguous linguistic structures implicit knowledge

PAGE 31

31 Fig ure 1 5 A 2 dimensional space with vectors one = (1, 1 ) and speak = ( 0 1 ) speak (0, 1) 0 0 1 1 thereof whereof one (1, 1)

PAGE 32

32 CHAPTER 2 RELATED WORK This section surveys current leading techniques in the fields of information retrieval, knowledge organization, information extraction and computational linguistics that were the focus of this research. A ttempts at using these techniques to build information systems in agriculture and natural resources are described. Ag riculture and Natural Resources Applications In recent years ontology applications have become increasingly popular in various specialized domains including a griculture and n atural r esources (Keet 2009). A w ide array of case studies includes the domains of rice production ( Thunkijjanukij et al 2009), crop wild relatives (Morten 2007), fisheries (Caracciolo et al. 2007), an agricultural question answering system ( Vila & Ferrandez 2009) and agricultural systems simulation and modeling (Beck et al 2010). R ice production Thunkijjanukij et al. (2009) developed a plant production ontology for rice production. The ontology was constructed manually with the help of domain experts and contained over 2300 concepts and 5600 terms, with hierarchical, associative an d equivalent relations, and allows reasoning about rice production knowledge. Concepts from the ontology were compared with existing terms in the Thai AGROVOC Thesaurus and it was concluded that about 48% of terms in the ontology had already existed in the thesaurus. Furthermore, the study proposed a set of criteria and rules for semi automatic construction and maintenance of plant production ontologies. The rules were designed to set guidelines for computer operation and they can be used for adding ontolog y terms and concepts by both humans and computer software. Authors argue that semi automatic approaches to ontology maintenance are

PAGE 33

33 attractive as the cost of employing domain expert s is high in terms of both time and expenses. However, e xpert expertise was found indispensible to build a well structure d ontology that can accurately represent knowledge. C rop wild relatives Hulden (2007) describes development of an ontology relevant to crop wild relatives (CWR) defined as non domesticated plants that are genetically related to crops, considered as important genetic resources and used to improve yields and the nutritional quality of crop s. The ontology was constructed manually from an existing database created in a previous project, from on line sources, such as glossaries, dictionaries, publications and thematic websites, and from CWR descriptor lists ( lists used in botany that contain s description). The CWR ontology represents themes, such as: agriculture, botany, environment and protection, earth and soil sciences, genetics, and law and resource management. Authors state that in retrospective, it may have b een a mistake to add the descriptor list terms at the early stage of ontology development (when the set of main terms was identified/selected), because of the implications this had on creating the ontology structure The adding of terms from the CWR descri ptor lists considerably slowed this process because of the number of unrelated, odd terms, such as professional titles, not directly relevant for CWR. Fisheries. Caracciolo et al (2007) reports results of experiments with ontology learning technologies ap plied to the fisheries domain in Food and Agriculture Organization (FAO) and provides a set of recommendations based on the experiments. The study covers semi automatic technique s used to acquire knowledge from domain

PAGE 34

34 specific documents and existing ontolo gies. The techniques adopted in the work include terminology extraction, similarity induction, relation extraction and ontology mapping. T erminology extraction was used to obtain terms from websites and from a full text digital repository in the domain of f isheries. Th e terms served as a suggest ion of new concepts to domain experts who validate d them in order to populate the ontology. The implementation of the technology was based on regular expressions, statistical measures of reliability and a technique b ased on a supervised Named Entity Recognition technology Based on the results, the authors showed that t erminology extraction was a useful technique although it was found to require a large effort in validation due to the low accuracy of the returned results, at least in case of specific domain, such as the fisheries The study recommend s the exploitation of the technology just to get the first idea of the concepts required to describe new domains on a basis of a corpus of text s rather than to extensively check the full list of terms A ddition ally, further development of terminology extraction is suggested in order to improve the precision of the algorithms. Another approach presented in Caracciolo et al. (2007) is similarity induction Its goal was to help domain expert s find concepts referring either to the same concept or to a taxonomically related one. The task was approached by adopting distributional similarity techniques such as semantic domain a technique based on th e concept of VSM. The experiments evaluated effectiveness of similarity induction in identifying taxonomical relationships such as hypernyms, hyponyms and synonyms of a selected concept in the ontology. The study do es not recommend the use of similarity based techniques by themselves to solve taxonomy induction problems, but rather to use them

PAGE 35

35 in combination with other techniques. In addition, similarity based tools could be used by the domain expert to find useful sug gestions during ontology design, for example by query ing a system whenever a concept is not very well understood or conceptualized. A gricultural question answering system Vila & Ferrndez (2009) presents a question answering (QA) system for the agricultur al domain. The system was based on texts from about 200 articles from the Cuban Journal of Agricultural Science. In initial experiments, an open domain QA system was used with the agricultural journal articles, but a very low precision of answers was obtai ned. The authors argue that the poor performance of the open domain tools can be contributed to: complex domain terminology, limited size of the document collection (corpus), and inconsistent formulation of the information need by users. To address the above issues the open domain system was adapted for use in the domain of agricultural journal by incorporating several domain specific resources. An ontology with bibliographic information from the agricultural journal was created. The ontology included cl asses representing journal articles, authors, subjects, etc. with properties, such as title, abstract, pages, etc. The ontology was enriched with the AGROVOC Thesaurus (FAO 2011a) and the WordNet lexical database (Princeton University 2010 ) by mapping rele vant terms. CL techniques were used, including stop word removal and stemming, to identify terms in the document collection and statistical measures, such as TF IDF scores, were applied to associate articles and subjects with relevant terms. The authors co nducted a series of experiments to compare the performance of their approach with the open domain system. A significant improvement of the IR process was reported in case of the domain specific system.

PAGE 36

36 Agricultural systems modeling Beck et al. (2010) pres ents an application of ontologies for development of soil, water and nutrient models for citrus and sugarcane. Authors developed a new approach towards computer modeling of natural systems that is ontology based simulation. The ontology was used for storin g information on various model components (such as model structure, equations and symbols ) and for automatically generating code for model simulations. The Lyra ontology management system (Beck 2008) was used for construction of models and automatic code g eneration for modeling hydrological, biological, physical transformation, and transport processes. The functionality of ontologies enable d sharing model elements among models having similar sub systems, so the most suitable model may be constructed for a g iven application. Authors point out some potential ontology applications for modeling, such as automatically connecting models and data sources, serving as model base framework for models and components or search ing the I nternet for available data required for propagating model parameters and input data, assuming that databases containing wide range of standardized data for modeling we re a vailable online However, this particular study was focused on modeling dynamic syste ms behavior and was not applied to organizing documents or text based information resources. Other Work in IE and CL E xtract ing information such as relevant concepts and relations to form some kind of KOS in an automatic manner is the focus of many studi es The techniques depend on sources of information. The work of Lenci (2008) focuses on natural language sources in the form of documents To facilitate ontology learning two statistical semantics techniques are used : association measures and Word Space M odels which allow quantifying lexical association strength between words.

PAGE 37

37 A case study on onto logy learning from legal texts ( Lenci et al. 2007) suggests a particular association measure: log likelihood ratio. It was tested against other association measu res such as mutual information, chi square etc., and log likelihood fared consistently better than the others. Moreover, this measure wa s found to be less prone to assigning high scores to very sparse pairs. The association measures are incorporated in to the Text 2 Knowledge (T2K) software application used for a utomatic extraction of domain terminology from Italian law texts and medical reports. T2K has a hybrid architecture based on CL modules and statistical filtering. The a 2K is to rank the list of candidate multiword terms by their lexical association strength. T2K is being developed by Automated Learning Group at the University of Illinois and a free Academic Use License is available. Measuring semantic similarity between words remains a challenging task and is the focus of research in the field of CL. A number of proposed approaches, such as Vector Space Models (VSMs) (Guo 2008, Croft 2009, Sahlgren 2006), perform statistical analysis to identify significant word co occurrence patterns (Alexopoulou et al. 2008, Caracciolo et al. 2007). processed. Statistical methods perform poorly when the data is scarce. Bril l et al. (2001) showed that large amount of text used for the analysis improves the quality of classical statistical methods. To address the problem of data shortage, Turney (2001) proposed using the biggest available data source, which is the Intern et. By using information available on the Internet rather than in a single library, the amount of resources

PAGE 38

38 available for the analysis increases vastly greatly enhancing the robustness of the statistical measures (Sanchez & Moreno, 2007). Studies such as Cimia no & Staab (2004) and Cilibrasi & Vitanyi (2004) reported successful use of Internet search engines to obtain robust statistics. T he information available on the Internet such as page counts or text snippets returned by a Web search enhance s the use of s tatistical methods in Bollegala ( 2007). Sanchez & Moreno (2007) presents a general purpose tool for structuring electronic resources in an unsupervised, automatic way. The approach is based on knowledge acquisition from text. Instead of using a preselected domain corpus or some degree of supervision and previous knowledge, the approach uses the Internet as a learning corpus for developing knowledge structures. The tool provides easier access to resources in digital repositories by automatically constructing taxonomies for those resources according to the main topics discovered for a particular domain. The taxonomies are created: without any human expert supervision (that makes searching of large domains feasible), automatically (that allows for easy update s of dynamically developing repositories), in a domain independent way that can easily be applied to other domains. VIVO VIVO is an information infrastructure for scholarly and research information (V I VO 2011a) Its major goal is to enable discovery of re search information, such as publications, researchers, departments, projects, etc., within an institution as well as across multiple institutions VIVO was initially developed by Cornell University in 2002 2005 and was meant to provide information resource s to people looking for research

PAGE 39

39 information of the U niversity ( Lin 2010 ) In 2009 the National Center for Research Resources of the National Institutes of Health awarded a $12.2M stimulus grant to the University of Florida and six partner institutions : Cornell University; Indiana University; Ponce School of Medicine, Puerto Rico; The Scripps Research Institute; Washington University School of Medicine in St. Louis; and Weill Cornell Medical College to enable national level networking of the science com munity us ing VIVO as a framework ( Holmes 2010 ) In 2010 t he U.S. Department of Agriculture (USDA) commit ted to participate in VIVO The USDA became the first federal organization to make use of VIVO ( Kaplan 2010) USDA will be using VIVO for organizing ref erences about federal agriculture research, scienti st collaboration and networking Information in VIVO is stored as triples consisting of a subject ( an individual ), a predicate (an object property or a data property) and an object ( an individual). The t riples reflect the structure of a sentence in ordinary language and ontology data structures are often put into the form of triples (a particular ontology can be expressed using a large number of triples) Subject predicate object triples express the rela tion ships among individuals in VIVO for example, a person (subject) is author of (predicate) publication (object). With VIVO, one can create and load ontologies, edit relationships, build a public web site to display data, and search it (VIVO 2011 b ) VIVO can contain contact information, publications, research grants, educational background, honors and awards, research descriptions, teaching efforts, and other items of interest. The system uses CL and IE techniques for automatic data ingestion from var i ous sources such as university human resources systems or bibliographic databases in order to minimize the need for manual input VIVO supports browsing and

PAGE 40

40 a search function which returns faceted results I t also offers visualization functionality sho wn in Figure 2 1 The implementation of VIVO at University of Florida as of January 2011 contains information about: 16000 grants 15000 people 5000 organizations 1000 publications 400 events (seminars, talks, conferences) 275 geographic locations 800 subject areas A c loser look at the content reveals many deficiencies of the system at its current stage. There are many duplicate items, and the information is not well organized in terms of being assigned to topics or even categories. Univer sity of Florida. College of Medicine University School (while most other universities are listed as University ). The system of categories (such as people, organizations, publications, etc) is supposed t o be the main way of identifying the type of information one is looking at, but the categories are inconsistent and overlapping, and the relationships between categories are unclear the hierarchy, if existing, is not shown Fo r example University, School, College or Department are all sub categories of Organization with no relationship s between each other Individuals are organization within College of Liberal Arts and Sciences T he topics (or subject areas) are not related with each other at all. This means that, fo know if and how crop water requirements are related to irrigation management although it contains both topics. All of this hinders br owsing and usefulness of the faceted search results, which

PAGE 41

41 are supposed to be the core functionality and the main selling point of VIVO. It may also indicate that most of the data was automatically ingested from various databases without verification. Sum mary Related work presented in this chapter includes numerous ontology applications in the ag riculture and natural resources domain such as rice production ( Thunkijjanukij et al. 2009), crop wild relatives (Morten 2007) and agricultural systems simulation and modeling (Beck et al. 2010) that adapt manual approaches for ontology development At the same time the need for automatization of this process is expressed ( Thunkijjanukij et al. 2009). While fully automatic systems a by some authors (Caracciolo et al. 2007), attempts are made to adapt semi automatic techniques, but none of the related work presents a working implementation of such approach Furthermore, IR facilities in agricultur al collec tions are limited mainly to traditional text search methods. Th e work presented in the following chapters goes the next step beyond other agriculture and natural resources projects b y applying an ontology to facilitate the IR process and adaptin g IE and CL techniques to facilitate ontology construction

PAGE 42

42 Figure 2 1. Visualization using a VIVO co author network (source: https://vivo.ufl.edu/visualization?uri=http%3A%2F%2Fvivo.ufl.edu%2Findividu al%2Fn23735&vis=person_level&render_mode=standalone )

PAGE 43

43 CHA PTER 3 ONTOLOGY APPLICATION METHODOLOGY An ontology approac h has been adapted and applied to the Water Conservation Digital Library (WCDL) (Ziemba et al. 2011) holding a dynamic collection of various types of digital resources in the domain of urban water conservation in Florida. T he WCDL was established t o facilitate making water related information available in an organized fashion. The main objectives of the l ibrary are identifying, organizing and making accessible various types of information in the domain of urban water conservation in Florida. For this purpose an information management system was developed that integrates all aspects of information and deliv ers content to decision makers. The core of the system is the ontology that conta ins all relevant information and allows for effective management and presentation of a wide variety of information types coming from various sources. Ontology Development The ontology used to manage the WCDL was developed following a methodology used previously by Uschold & King (1995) Beck & Pinto ( 2002 ) and Pinto & Martins ( 200 4) The m ain steps i n this process are: Specification: the purpose and scope of the ontology is identified ; Conceptualization: a conceptual model that describes the ontology is created according to the specification ; Formalization: the conceptual model is transformed into a formal model written in a formal way ; Implementation: the formal model is implemented using an ontology authoring tool; M ainte nance: updates, additions and correction to the implemented ontology are made

PAGE 44

44 Specification The purpose of the W CDL ontology is to facilitate IR by identifying, organizing and making accessible and searchable various types of information in the domain of urban water conservation in Florida T he information managed by the library can b e used for regulatory purposes by water management agencies, water utilities and other organizations C onceptualiz ation and Formalization Following the above specification, t wo parts of the ontology were developed for the library: a top level ontology that defines all classes of library resources and a domain specific ontology about urban water conservation in Florida Subsets of operations, parameters and constraints from existing ontologies were selected when possible and Since the goal of the project was to devel op a library, all individuals fe ll under the top level classes. For each top level class, property restrictions with corresponding data types we re defined. The bibliographic description followed the Dublin Core ( DCMI 2011 ) metadata standard, which during later stages of the project proved useful for sharing the information through a standard protocol, the Open Archives Initiative ( OAI 2008 ) The domain ontology was developed around a set of terms predefined by water management organizations in Florida. Then, that list of terms was related w ith terms from other relevant sources, such as : the Water Science Glossary of Terms (USGS 2011) NAL Thesaurus WaterWiser Glossary of Common Water Terms ( AWWA 2011 ), Water Words Dictionary ( NDWR 2011 ) and other documents on water conservation

PAGE 45

45 I mplementat ion and Maintenance The model developed in previous step was implemented using the ObjecEditor authoring tool (Beck 2008) shown in Figure 3 1 ObjecEditor is a low level tool for visualizing the ontology as a node and link style graph diagram. D iagrams ar e built by creating classes and individuals as nodes, and connecting related concepts using links to build associations. The diagrams are manually composed into a desired layout. The maintenance of the ontology was performed using ObjecEditor tool as well. A n example of a book individual with various properties in OWL is presented in Figure 3 2 The property isReferenceOf relates the book individual with a concept in the domain specific ontology under which it is catalogued. Other relationships would point to In the above example the property isReferenceOf reference would help find that boo k when looking for information about xeriscaping. The Ontology Application section further explains how relationships facilitate finding related information in the library. Knowledge Acquisition Knowledge acquisition is performed at all stages of the ontol ogy development. During this process knowledge about the domain is acquired with the help of domain experts and by referring to relevant literature or other existing resources, ontologies in particular. Reusability of what exists has proven its success in many areas such as software engineering, medical systems, and environmental infor mation systems (Ding & Fensel 2001). The best candidate ontology is the one that can most easily be adapted to become the new ontology. Problems arise when trying to capture t he concepts and

PAGE 46

46 their relationships from existing working systems, many issues are related to consistencies, redundancies, and conf licts. No ontologies specific to the water conservation domain could be found, hence the inability to reuse existing ontologi es in this project. Nevertheless, some concepts in existing general purpose ontologies match the ones in the WCDL ontology allowing for possible interoperability and expandability. Table 3 ( Cycorp 2011 ) Suggested Upper Merged Ontology (SUMO) ( Pease 2011 ) and the WCDL o ntology. In ontologies. This shows that at a higher level there is some agreement between concepts, but only a domain ontology can cover a specific topic in detail. Another example is 3 2 which is vaguely defined in OpenCyc, but very detailed in SUMO. The definition used in the WCDL ontology is more closely rela ted to the one on SUMO. Ontology Application The WCDL ontology contains various types of publications, datasets, people, organizations, simulation models, news, events and more. The content can be accessed through a dedicated web based interface available at http://library.conservefloridawater.org as well as parts of other web pages, standard formats and protocols. The main web interface has been available online since 2008 and is seeing an increasin g use since then. Statistical analysis of the web use performed with AWStats package reports about 14 000 unique visitors and over 76 000 visits to the web site in 2009. The two parts of the ontology top level and domain specific have different functions in the application.

PAGE 47

47 Top L evel Ontology The top level ontology contains the classes of various resources collected in the library, for example: publication, book, report, person, organization, event, etc. The classes are related using taxonomic re lationships, for example: book is a subclass of publication. Restrictions on properties are used, for example: title of publication must be a string, or first author of publication must be a person. Cardinalities are defined as well, for example: publicati on cannot have more than one first author. The top level ontology is utilized for: Creation and verification of individuals all properties must meet the respective class property restrictions, as shown in Fig ure 3 3 ; Displaying the individuals of a spe cific class for example all publication (and its subclasses) individuals can be displayed with the ability to narrow or widen the selection according to the taxonomic relationships (for example show book individuals only); Organizing of the displayed ind ividuals according to their class for example when displaying one of the concepts all related individuals are grouped by class, as shown in Fig ure 3 4 Domain Specific Ontology The domain specific ontology describes concepts and their relationships with in the domain of urban water conservation for Florida. In the web interface the concepts are referred to as keywords, since this notion has proved most familiar for the majority of the library users. A definition property is provided for each concept, and relationships are used to relate concepts. For example the concept best management practice is a subclass of the concept water conservation measure Additionally, the library resources, represented by ontology individuals, are related to relevant conce pts, for to water conservation measure concept. At its current stage the WCDL ontology

PAGE 48

48 contains about 1000 concepts with definitions and 700 individuals. The domain spe cific ontology is utilized for: Browsing through the concepts in the water conservation domain with the ability to display any of the related concepts, as shown in Fig ure 3 4 ; Showing library resources related to a concept and vice versa, see Fig ure 3 4 ; Finding more relevant resources than a traditional text search, as explained in the next section. Ontology Assisted Search Major weaknesses of traditional full text search methods are low precision and lack of structure, which can lead to a lot of irreleva nt search results and failure to locate relevant results (Beall, 2008; Beck & Pinto 2002 ; Moskovitch et al 2007). Query expansion is a technique used to improve search performance usually by adding more related keywords to the query (Yang et al 2007). In the adapted approach a text search is enhanced with information stored in the ontology the que ry is expanded using taxonomic and horizontal relationships. Given a query text a string of characters entered by the user the system performs th e following tasks: a. Library resources (publications, organizations, etc.) matching the query text are indentified in a manner similar to a traditional text search; b. Ontology concepts matching the query text are indentified similarly as in step a., syno nyms are included in the search; c. For concepts identified in the step b., the query is expanded with more specific concepts found in the ontology; d. For concepts identified in steps b. and c., the query is expanded using horizontal relationships in the ontology, the results are compared with and without this step; e For concepts identified in steps b. c. and d ., relevant resources are retrieved; f Finally, all retrieved concepts, followed by the resources grouped according to their class are displayed as shown in Fig ure 3 5

PAGE 49

49 System Architecture The WCDL information management system was implemented using the Lyra on tology management system (Beck 2008), the Java programming platform ( Oracle 2011 ), the XML language ( Bray et al. 2008 ) and the XSLT tran sformation ( Clark 1999 ) technologies. Lyra ontology management system i s a platform for developing and deploying ontology based applications based on a formal ontology language Lyra is an open source environment, written entirely in Java, and has been previously used to develop a wide range of applications in ag riculture and natural resources. A Java servlet application was developed that queries the Lyra database according to the user request passed in the form of HTTP URI and generates XML representa tion of requested data The XML representation is then transformed using a XSLT style sheet into a required output format as shown in Fig ure 3 6 This architecture leverages the flexibility an d extensibility of the ontology KOS by providing the information in many formats and adhering to widely accepted standards as follows: most ontology features can be accessed in a dedicated library web site, as shown in Fig ure 3 3 and Fig ure 3 4 ; based on the ontology, many elements of the project main web site are dyna mically generated, like current news and events; the ontology is used to generate RSS feeds according to the specification ( RSS Advisory Board 2009 ) ; the library exposes its content as Open Arc hives Initiative Data Provider ( OAI 2008 ) using Dublin Core ( DCMI 2011 ) metadata in the ontology. Results of an evaluation of the WCDL system are presented in C hapter 5.

PAGE 50

50 Table 3 1. Ontology OpenCyc SUMO WCDL English Water Water Water English Aliases H2O H2O Type inanimate object compound substance Fluid Subtypes fresh water, ice, pure water, salt water, shallow water, water route, water table freshwater, groundwater, ice, iceberg aggressive water, blackwater, brackish water, developed water, freshwater, graywater, groundwater, hard water, import water, potable water, process wastewater, process water, rainwater, raw water, reclaimed water, running water, sparkling water, spring water, surface water Other relationships is used by: agriculture water use, irrigation, mining water use is affected by: hydrologic cycle has: water quality Table 3 2 OpenCyc SUMO WCDL English PublishedMaterial text Publication English Aliases publications document Type store of information artifact resource Subtypes book, recorded video product, recorded sound product article, audio recording, book, certificate, form, joint publication, label, lyrics, motion picture, music, musical composition, narrative, paragra ph, series, summary, treaty article, book, booklet, citation, collection, fact sheet, journal, magazine, manual, minutes, misc, proceedings, report, slideshow, thesis

PAGE 51

51 Figure 3 1. ObjectEditor authoring tool. Figure 3 2 Book individual in OWL. 2006 A Guide to Florida Landscaping

PAGE 52

52 Figure 3 3 Creati ng an individual of organization class Figure 3 4 Domain specific ontology concept in the web interface

PAGE 53

53 Figure 3 5 Ontology assisted search results Figure 3 6 System architecture diagram computer application editor computer database server end user computer web server HTML page RMI interface HTTP URI database Internet browser XSLT stylesheet Java Servlet authoring tool XML data information flow Legend:

PAGE 54

54 CHAPTER 4 INFORMATION EXTRACTI ON METHODOLOGY The ontology application to the Water Conservation Digital Library presented in Chapter 3 and the results discussed in Chapter 5 demonstrated the need for automation of certain library maintenance tasks. Manual processing of a large and dynamic number of concepts required increased time and effort. The possibility of huma n error was substantial as numerous inconsistencies in the ontology were found. Automation was needed for the process of cataloging new documents as well as adding new concepts to the ontology and identifying relationships. For this purpose, information ex traction and computational linguistics techniques were researched and applied to the Best Management Practices (BMP ) Publication Library developed based on the experiences gathered in the WCDL project. The BMP Publication Library is a collection of about 6 00 documents mostly journal articles and technical reports, available at http://lyra.ifas.ufl.edu/LIB It was established to share current and past research efforts in the development of agricultural BMPs in Fl orida Initial Construction First, a small ontology was built to manage the BMP Publication Library It was based on the model developed previously for the WCDL and populated using automatic data ingestion methods Twelve initial domain ontology concepts were created based on twelve by document authors, with the top five For the purposes of the methodology described in this chapter, the text in a digital form was required for each document in the collection. Many of the documents in the BMP Publication Library had been s canned from paper sources, but optical character r ecognition (OCR) was not

PAGE 55

55 performed for some of them, which meant that the doc ument text was not available for analysis. To remedy this issue the whole collection was processed with the help of University of Florida Digital Collections (UFDC) using state of the art PrimeOCR software ( PrimeRecognition 2011) which produced high qualit y text for all documents in the collection. Along with the documents, a set of text files with document abstracts was available that had been created in a previous stage of the project The files included document metadata, such as title authors year, s ource, keywords and abstract in a format that was fairly consistent among the files. This allowed for ingesting the information and creating the ontology individuals. A Java application was developed to auto matically create individuals of document and aut hor classes and their interrelationships. The application parsed the text using regular expressions and the common structure of the files, for example title was in the first line, authors in the second, etc A number of errors and inconsistencies in the te xt files were discovered that required some manual inspection and correction. Four hundred twelve d ocument individuals were created with the following attributes: title, year, source, abstract and keywords. Five hundred twenty four a uthor individuals were created with the following attributes: first name, last name and suffix. Title Parsing With the Stanford Parser Document titles carry the most essential information about the publication contents. An experiment was performed to discover potential ontology concepts in document titles with the aid of CL techniques. The Stanford Parser a Java implementation of a probabilistic natural language parser was used to analyze titles of publications in the l ibrary. The package provides an accura te probabilistic context free

PAGE 56

56 grammar (PCFG) parser and comes supplied with a well engineered English grammar based on Penn Treebank corpus (Marcus et al. 1993, University of Pennsylvania 1999 ). The Stanford Parser accepts a simple string (untokenized, un tagged) as the input and can output k best parses (with highest probabilities) of the given string. The output also includes Stanford typed dependencies which were designed to provide a simple description of the grammatical relationships in a sentence. The se dependencies we re used to extract noun compound modifiers the candidate ontology concepts A Java application was developed to analyze document titles The application used libraries available in the Stanford Parser package with English PCFG grammar i n order to perform CL analysis. Methods in LexicalizedParser class were used including getBestPCFGParse() and allTypedDependencies() in GrammaticalStructure class. Stanford typed dependencies were obtained for each title. Noun compound modifiers were filte red from the dependencies. A noun compound modifier of a noun phrase is any noun that serves to modify the head noun ( Marne ff e & Manning 2008 ). For example, for the publication title Capturing Water with Rain Barrels the parse with highest probability is presented in Figure 4 1. The part of speech tags and phrasal categories come from the Penn Treebank corpus, the tags used in the parse tree in Figure 4 1 are: VP Verb phrase VBG Verb, gerund or present participle NP Noun phrase NNP Proper noun, singular PP Prepositional Phrase IN Preposition or subordinating conjunction NN Noun, singular or mass NNS Noun, plural. The Stanford typed dependencies for the analyzed title are:

PAGE 57

57 dobj(Capturing 1, water 2) nn(barrels 5, rain 4) prep_with(Capturi ng 1, barrels 5) N oun compound modifiers a re denoted by nn thus there is one candidate ontology concept extracted from the above example: rain barrels The results of title parsing are discussed in Chapter 5. Frequency Analysis A statistical approach to information extraction based on the analysis of word frequency and co occurrence patterns was adapted. The analysis was performed in the following steps: Concept extraction: identification of most frequent concepts in the corpus; Semantic similarity: calcu lation of similarity measures for concepts extracted in the previous step; Concept document relevance: calculation of relevance measures between documents and concepts. Concept Extraction Document c orpus. In the first step of the frequency analysis the most frequent concepts in the document corpus were extracted. Each document was tokenized using white space and punctuation marks. Light stemming was performed to take care of noun plural forms, as most of the concepts are noun phrases. All n grams (subseq uences of n words) up to 6 words in width were matched against NAL Thesaurus. At this point NAL Thesaurus was used as a lexicon: only concepts that occur in the Thesaurus were used in further analysis. The method resulted in extraction of over 5000 con cepts that appear more than once in the document corpus. Web c orpus. The performance of statistical methods depends on the amount of data processed. By using information available on the Internet rather than in a single

PAGE 58

58 library, amount of resources avail able for the analysis increases vastly. As discussed in Chapter 2, many recent studies report successful use of the web as corpus. Compared to the static and limited in size corpus of documents in the BMP Publication Library, the advantages of the web corp us include: large size, diversified language and up to date language. The web corpus was generated with the use of the Allthe W eb search engine ( Yahoo! Inc. 2011) A series of queries was run with top concepts extracted from the document corpus in the previ ous step concepts that occurred more than 100 times were used. The top 100 search results for each concept were retrieved yielding about 85,000 snippets E ach result wa s a snippet consisting of the search term and neighboring words below is an example o (original spelling was preserved as provided by the search engine) : Learn about garden and lawn irrigation systems: drip irrigation supplies, sprinkler systems, garden hose nozzels and garden hose caddy, rain barr els, lawn sprinklers The snippets were processed in a similar manner as the original documents, as described in the previous section, n grams matching NAL Thesaurus were extracted. The method resulted in retrieval of over 7000 concepts that appear more than once in the web corpus. Table 4 1 summarizes the concepts extracted from the two corpora. Over 74% of concepts retrieved from the d ocument corpus are included in concepts from the web corpus, while 54% of web concepts are included in document concepts. Semantic Similarity In the second step of the frequency analysis a VSM was used to quantify semantic similarity between concepts. This technique assumes that s emantically similar words tend to occur in similar contexts. The context was defined as the whole document. First,

PAGE 59

59 a co occurrence matrix in this case a document by concept binary matrix was built as shown in Table 4 2 Each ro w in the matrix represents a document, each column represents a concept, and the value equals 1 if the concept occurs in the document, 0 otherwise as shown in the equation below: Table 4 3 presents an example of such table with three concepts and three documents. The concept citrus occurs in Doc1 and Doc3, so it can be represented by a binary vector citrus = (1, 0, 1) which can be expressed by a set citrus : {Doc1, Doc2, Doc3}, whil e nitrogen : {Doc1, Doc2} and irrigation : {Doc3}. To assess the similarity of binary vectors, several measures are used in CL practice: matching coefficient (MC), Dice coefficient, Jaccard coefficient, overlap coefficient and cosine. Two of these measures w ere chosen for further analysis: MC being the simplest and cosine because of certain characteristics important in statistical analysis ( Manning & Schtze 1999). MC for two concepts counts the number of documents both concepts occur in together, the higher the value, the more similar the two concepts are. The major drawback of this measure is that it favors concepts that are frequent in the whole corpus. Cosine adjusts the MC value by the numbers of documents each of the two concepts occurs in, but the adjus tment is smaller in cases when these numbers differ from each other. T he similarity measures for two vectors represented by sets X and Y are defined in set operations as shown below:

PAGE 60

60 By applying the ab ove definitions to the example from Table 4 3, the following similarity values are obtained: matching coefficient(citrus, nitrogen) = 2 matching coefficient(citrus, irrigation ) = 1 cosine(citrus, nitrogen) = 0.82 cosine (citrus, irrigation ) = 0.58 MC and cosine were used to build a concept by concept similarity matrix for all extracted concepts, as shown in Table 4 4 Each row and column in the matrix represents a concept, and the values measure semantic similarity between the two concepts. The matrix prov ides a similarity measure for each pair of concepts. Two versions of the matrix were calculated, one with MC and one with cosine values. For example, in the document corpus, according to MC, the most similar concepts to citrus are: F lorida, water, researc h, management, soil, methods, time, area, trees and surfaces According to cosine, they would be: groves, trees, oranges, lakes, F lorida, fruits, rootstocks, management, research and water As expected from the definition of the two measures, MC returned v ery general concepts that occur frequently in the corpus, while cosine returned concepts much more closely related to citrus Document C oncept R elevance In the third step of the frequency analysis the most relevant concepts for each document were identifi ed. A measure widely used in IR systems is Term Frequency Inverse Document Frequency (TF IDF). TF IDF provides values for each concept in a document through an inverse proportion of the frequency of the concept in a particular document to the percentage of documents the concept appears in ( Manning et al.

PAGE 61

61 2008). Concepts with high TF IDF numbers imply a strong relationship with the document they appear in. A document by concept frequency matrix was built as shown in Table 4 5 Each row in the matrix repres ents a document, each column represents a concept, and the values are frequencies of how many times each concept occurs in each document. A TF IDF matrix was built as shown in Table 4 6 The values in the table are TF IDF measures for each concept and doc ument calculated according to the following equation: TF(Doc i Concept j ) frequency of concept j in document i (from Table 4 5) DF j number of documents in collection containing concept j N n umber of documents in the collection An improvement to ordinary document concept relevance measures (TF and TF IDF) is proposed that takes into account semantic similarity information described in the previous section. It is based on the assumption that concepts similar to the ones that were found relevant to a particular document, are also relevant to this document. The relevance of a concept C to a document D is adjusted according to the similarity of C to each of the other concepts C k and their relevance to D The more similar C k is to C and the more relevant C k is to D the bigger the adjustment. This way all other concepts in following equation:

PAGE 62

62 voteREL vote adjusted relevance measure developed in this work, REL document concept relevance measure, such as TF or TF IDF, SIM concept concept similarity measure. The voting approach was first applied to simple document by concept TF frequencies. A vote ad justed frequency matrix was built as shown in Table 4 7 The values are based on TF frequencies from Table 4 5 and concept similarity from Table 4 4. Subsequently, the voting approach was applied to TF IDF measures. A vote adjusted TF IDF matrix was built as shown in Table 4 8. The values are based on TF IDF measures from Table 4 6 and concept similarity from Table 4 4 Finally, for each document, a list of most relevant concepts was determined according to 4 approaches: ordinary frequencies from Table 4 5 ordinary TF IDF from Table 4 6 vote adjusted frequencies fro m Table 4 7 and vote a djusted TF IDF from Table 4 8. Results of this analysis are presented in the next chapter. Table 4 1 Summary of concept extraction. Documents Web Overlap Number of concepts 5187 7126 3851 Overlap % 74 % 54 % Table 4 2 Definition of d ocument by concept binary matrix Concept1 Concept2 Doc1 OCC (Doc1, Concept1) OCC (Doc1, Concept2) Doc2 OCC (Doc2, Concept1)

PAGE 63

63 Table 4 3. Example of d ocument by concept binary matrix citrus nitrogen irrigation Doc1 1 1 0 Doc2 1 1 0 Doc3 1 0 1 Table 4 4. Concept by concept similarity matrix Concept1 Concept2 Concept1 SIM (Concept1, Concept1) SIM (Concept1, Concept2) Concept2 SIM (Concept2, Concept1) Table 4 5. D ocument by concept frequency matrix Concept1 Concept2 Doc1 TF (Doc1, Concept1) TF (Doc1, Concept2) Doc2 TF (Doc2, Concept1) Table 4 6. TF IDF matrix Concept1 Concept2 Doc1 TF IDF (Doc1, Concept1) TF IDF (Doc1, Concept2) Doc2 TF IDF (Doc2, Concept1) Table 4 7 Vote adjusted frequency matrix Concept1 Concept2 Doc1 voteTF (Doc1, Concept1) voteTF (Doc1, Concept2) Doc2 voteTF (Doc2, Concept1)

PAGE 64

64 Table 4 8 Vote adjusted TF IDF matrix Concept1 Concept2 Doc1 voteTF IDF (Doc1, Concept1) voteTF IDF (Doc1, Concept2) Doc2 voteTF IDF (Doc2, Concept1) Figure 4 Capturing Water with Rain Barrels ( ROOT (VP (VBG Capturing) (NP (NP (NNP Water)) (PP (IN with) (NP (NN Rain) (NNS Barrels)))))))

PAGE 65

65 CHAPTER 5 RESULTS AND DISCUSSI ON This chapter is organized as follows: f irst the ontology developed to manage the Water Conservation Digital Library is descr i bed with examples of the ontology application followed by an evaluation of search results base d on that ontology. Next, the results of automatic techniques that facilitate ontology authoring are discussed P arsing document titles to extract ontology concepts and frequency analysis techniques are evaluated. Water Conservation Ontology T he Water Conservation Digital Library ontology contains over 1000 concepts from the water conservation domain, about 35 top level concepts (shown in Figure 5 1 ) representing the types of resources stored in the library, and over 1000 individuals, including over 400 publications The ontology provides content for a dedicated web based interface available at http://library.conservefloridawater.org as well as for other web pages, standard formats and pro tocols such as RSS and OAI The main web interface has been available online since 2008 and is seeing an increasing use since then. Statistical analysis of the web use performed with AWStats package reports about 14 000 unique visitors and over 76 000 visits to the web site in 2009. Figure 5 2 shows a fragment of the domain specific ontology the neighborhood of the precipitation concept with more specific concepts, such as rain hail and net precipitation The web interface page corresponding to the p recipitation concept is shown in Figure 5 3. Figure 5 Lawn Sprinkler Selection and Layou t for Uniform Water Application publication individual with three related domain specific concepts and three author individuals. The web interface page corr esponding to the publication showing its

PAGE 66

66 relationships and metadata is presented in Figure 5 5. To put these small ontology fragments in larger context, Figure 5 6 was created, showing the domain specific ontology with related individuals. The image covers about 75% of the current ontology and has been assembled for illustration purposes only. During the ontology development process a number of problems were encountered especially when the number of concepts increased above a certain level T here we re dif ficulties with keeping the ontology consistent duplicate items appeared, relationships with existing concepts were missed when adding new ones and accurate m anual cataloging of resources was problematic The above issues were, in large part, caused by the limitations of the authoring tool ObjectEditor, used for creation and editing of the ontology, as shown in Figure 5 6. The tool, due to its design, is only suitable for working with a limited number of concepts. A n alternative authoring tool, LyraBrowser, shown in Figure 5 7, was in development at the time. It would have made it easier to manually find inconsistencies in the ontology and correct them. No matter how good the authoring tool is, however, ma nual processing of a large and dynamic number of concepts require s increased t ime and effort and the possibility of human error grows significantly To address the above issues automatic methods were experimented with that facilitate concept discovery an d assess concept similarity The results of these experiments are discussed in further sections of this chapter. Ontology A ssisted Search To illustrate the ontology benefits for IR, an example text search was performed T he system was queried for the word This query was suggested by domain Three sets of search results were obtained:

PAGE 67

67 a baseline set obtained without the query expansion shown in column 1 in Table 5 1; a taxonomy ass isted set the query was expanded with taxonomical relationships from the ontology shown in columns 1 and 2 in Table 5 1; a n ontology assisted set the query was further expanded with horizontal relationships from the ontology shown in columns 1, 2 a nd 3 in Table 5 1 The results are summarized in Table 5 1 : the first column shows the baseline results, the second column presents additional results obtained with the use of taxonomical relationships and the third column shows additional results obtaine d with the use of horizontal relationships Full ontology assisted search result set is comprised of all three columns in Table 5 1. Additionally, the same query was u sed with Google Site Search (Google Inc. 2011) a traditional text search restrict ed t o t he Water Conservation Digital Library website The results of Google Site Search are presented in Table 5 2 The baseline search resulted in retrieval of 2 keywords and 5 publications, compared to 16 and 29 respectively in the taxonomy assisted set and 21 keywords and 37 publications in the full ontology assisted set The majority of the additional results do the baseline search was not able to retrieve them. They are, however, strongly related to the query text most of them pertain to some type of plumbing fixture. Such resources could on ly be retrieved with the aid of relationships in the ontology. The example indicates much better recall of the expanded query while retaining the precision vers us the baseline search However there might be cases when the query expansion yields too many results for a particular user. To address this issue, the search interface might provide a user with a choice whether to expand the query or not.

PAGE 68

68 Google Site Sea rch re trieved 67 items that are a mixture of publications, keywords websites organizations, keyword indexes, etc The way the results are presented lack s any struct ure, e ach of the items needs to be opened in order to identif y the type of information it represents (with the exception of PDF documents) In contrast, the ontology assisted search results are presented in an organized and structured fashion grouped by information type, making them easier to use. Furthermore, Goog le results contain many general publications that are not specific to plumbing, for example: relevant these items are for a particular user is subjective, but the larger the resul t set, the more they may hamper the search process. These results add to the number of items that need to be processed by the user, pushing other, possibly more relevant results towards the end. On the other hand Google missed some important publications o Family Residential time s in text. Only by incorporating semantics of the ontology in the retrieval process (for example, the knowledge that HET is a toilet and that a toilet is a plumbing fixture) many relevant documents can be retrieved. Moreover, showing ontology concepts rel evant to the search query in the results may encourage the user to further explore the ontology w hile searching for information. It must be noted that the results of the approach are highly dependent on the ontology structure and, most importantly, accurat e cataloging of resources under relevant ontology concepts. T he ontology structure is arbitrary to a large extent; every

PAGE 69

69 expert and user may look at knowledge at a different angle resulting in a different structure of the ontology. The same is true for cat aloging resources it depends on the which concepts are relevant to a particular document. All of this impacts the results of the ontology assisted search. Title Parsing Two experiments were performed to test the usefulnes s of Stanford Parser in concept extraction. The parser was used to extract two word noun phrases from document titles. In the first experiment the Water Conservation Digital Library collection was analyzed. About 300 publication titles were parsed, yielding about 600 concepts; the complete list is included in Appendix A. In the second experiment documents from the BMP Publication Library were analyzed. About 400 publication titles were parsed, yielding about 650 concepts shown in Appendix B. The nou n phrases were extracted from the most probable parse for each title. For example, for the publication title Advanced Guidelines for Preparing Water Conservatio n Plans the most probable parse is shown in Figure 5 8 For a given parse, Stanford Parser pro vides Stanford typed dependencies which were designed to provide a simple description of the grammatical relationships in a sentence. The Stanford typed dependencies for the parse shown in Figure 5 8 are: dep(Advanced 1, Guidelines 2) nn (Plans 7, Preparing 4) nn (Plans 7, Water 5) nn (Plans 7, Conservation 6) prep_for(Guidelines 2, Plans 7) N oun compound modifiers a re denoted by nn thus the noun phrases identified in the above example are: p reparing p lans w ater p lans and c onservation p lans In case of

PAGE 70

70 multiple nouns preceding each other (such as p reparing w ater c onservation p lans in the above example), the only head noun identified by Stanford Parser in the last one ( p lans in the example above). Because of this trait, the parser was unable to identify i mportant concepts, such as water conservation in the above example Possible reason for t his issue is the standard grammar used by the parser, which can be addressed by providing a domain specific grammar, which reflects specifics of the text collection D evelopment of such grammar is proposed as one of the next steps to continue this work. Another example is a document from the BMP Publication Library entitled Herbicide Transport in a Restored Riparian Forest Buffer System he most probable parse for th is title is shown in Figure 5 9 The Stanford typed dependencies for the above parse are: nn(Transport 2, Herbicide 1) det(System 9, a 4) nn(System 9, Restored 5) nn(System 9, Riparian 6) nn(System 9, Forest 7) nn(System 9, Buffer 8) prep_in(Transport 2, System 9) The phrases identified in the above example are: herbicide transport riparian system forest system and buffer system Similarly to the previous example, the parser was unable to identify an important concept: riparian forest A proper evaluat ion of concepts extracted using the parser can only be performed in a survey involving multiple domain experts. It is proposed as one of the next steps to continue this work. In order to perform a rough quantitative evaluation, it was counted how many time s each concept occurs in the abstract of the publication it was extracted

PAGE 71

71 from. It is assumed that the more relevant the concept, the more times it should occur in the abstract. Out of the 4 concepts identified in the above example, 2 were found in the abs tract: herbicide transport occurred twice and buffer system four times. A summary of concepts extracted from the BMP Publication Library is presented in Figure 5 10 The x axis i ndicates how many times a concept occurred in the abstract and the y axis s hows how many concepts occurred a particular number of times f or example the second tallest bar indicates that 88 concepts occurred once in the abstract the third bar shows that 40 concepts occurred twice, etc. Out of 648 concepts, 474 were not found in the corresponding abstract (represented by the tallest bar in the graph), 88 occurred once in the abstract and another 88 occurred more than once. The most frequent ones were: water quality and water requirements these concepts occurred 23 and 18 times respectively. The results show that over 73% of concepts extracted using the Sanford Parser do not occur in the abstract of the document they were extracted from. This result combined with t he issues observed for th e titles analyzed man ually indicate that the methodology of extracting concepts using Stanford Parse r needs further improvements such as domain specific grammar, to be useful for IE purposes of this work. Frequency Analysis Statistical techniques based on f requency analysis were applied to the BMP Publication Library collection. T he most frequent concepts were extracted from two corpora O ver 5 000 concepts were extracted from the original documents and over 7000 from a corpus generated with the use of a web search engine. Frequency is defined as the number of times the term occurs in a corpus. The list of 200 most

PAGE 72

72 frequent concepts for both corpora is presented in Appendix C. Top 10 concepts from this list are shown in Table 5 3. Zipf (1949) observed th at, in natural language, there are few frequent terms and many rare terms law, states that the fr equency of word tokens in a large corpus of natural language is inversely proportional to the rank when the terms are ranked descending by frequency. When the frequency is plotted versus rank on a log log graph a straight line is obtained with a slope of 1 often called a Zipf curve. Figure 5 11 presents a Zipf curve (straight line) and the frequ ency distribution of terms from 810,000 Reuters N ews stories in English l anguage (curved line). Figure 5 12 illustrates the distribution of frequency for concepts extracted from the BMP Publication Library document corpus and the web corpus. The concepts were similar to the one of the documents and both corpora reflect the frequency distribution of large English language corpora used in CL practice. An evaluation of similarity measures used in combination with two corpora as proposed in methodology was performed. NAL Thesaurus was used as a reference to in CL practice to measure semantic similarity ( Jarmasz and Szpakowicz 2003, Resnik 1995), so the domain specific NAL Thesaurus seems well suited for the given collection of texts. The semantic similarity between two terms was approximate d by a distance measure in the T hesaurus. The NAL distance is defined as the number of edges in the shortest path between the two terms in the NAL Thes aurus the shorter the path, the

PAGE 73

73 more similar the terms. For example, for the pair citrus grapefruits the NAL distance is 2 and for the pair citrus trees the NAL distance is 4 Figure 5 1 3 shows a fragment of NAL Thesaurus used to calculate the NAL di stances for the above two example pairs. The 100 most frequent concepts were extracted from the original documents NAL Thesaurus was used as a lexicon for the extraction process only concepts contained in the T hesaurus were considered. A list of all uni que pairs from the top 100 concepts was generated (yielding 9900 pairs), for each pair the following measures were calculated: MC based on document corpus, MC based on web search web corpus, cosine based on document corpus, cosine based on web search web c orpus. Each of the above measures was then compared against the NAL distance used as a reference measure for each concept pair. The calculated NAL distance values were in the range between 1 and 10. All concept pairs with the same NAL distance were groupe d together and the average of each measure was calculated for each group For example, for concept pairs with NAL distance equal 1 the average cosine based on document cor pus was 516, for NAL distance 2 the cosine was 537, for 3 474 etc A s ummary of the results is presented in Figure 5 1 4 Each point in the figure represents a set of concept pairs with a particular NAL distance (shown on the y axis), the average of the particular similarity measure for the group is shown on the x axis for example the top leftmost point in Figure 5 1 4 A shows that for all concept pairs with NAL distance equal 10 the average value of MC was 95 The similarity measure should reflect how close the meaning of the two concepts is, the NAL distance tells how distant i t is. Therefore a good measure should be

PAGE 74

74 reciprocally proportional to the NAL distance. A power function was fitted to the graph for each similarity measure. MC based on document corpus with R 2 =0.002 appears unrelated to NAL distance, cosine based on the s ame corpus with R 2 =0.72 shows improvement. Using the web search snippets corpus for calculating similarity measures shows significant improvement and the cosine measure with R 2 =0.97 appears most strongly related to NAL distance. It beats MC especially for the most similar pairs (ones with lowest NAL distance and highest similarity measure) and these should be of most interest for the library users. The improvement of the web corpus compared to the documents corpus can be explained largely by its size the re are about 600 documents and 85 000 web search snippets. This significantly improves the performance of stati stical measures, such as cosine Another comparison of the 4 similarity measures was performed, focusing on the top 10 most similar concepts, as these are most valuable when searching for information. For each of 100 most frequent concepts, 10 most similar concepts were extracted according to MC and cosine, based on document and web corpora Average NAL distance was calculated for each group of th e 10 most similar concepts. In concordance with previous results, cosine proved to be more precise measure than MC, and it yielded lower average NAL distance in most cases. Furthermore, web based similarity yielded concepts much more closely related and sp ecific than the ones based on original documents the NAL distance obtained with web based similarity measures was significantly lower than the distance for document based measures. Three examples are presented in Tables 5 4 5 5 and 5 6 showing 10 most s imilar concepts to

PAGE 75

75 citrus nitrogen and irrigation obtained using the four approaches. In each case, document based MC yielded very general concepts, such as water Florida or research On the other hand, web based cosine produced very specific concepts, such as various types of citrus in the last row of Table 5 4 processes and substances that involve or contain nitrogen in the last row of Table 5 5 or types of irrigation and irrigation equipment in the last row of Table 5 6 In each case, the aforementioned approach resulted in lowest average NAL distance, indicating that the top 10 concepts according to the approach are closer in NAL Thesaurus than in any of the other methods. The results suggest that the c osine similarity measure based on web corpus carries most valuable information for search and retrieval.

PAGE 76

76 Table 5 1 Results of three Baseline Search Taxonomy assisted Search ( additional results ) Ontology assisted Search ( additional results ) Keywords 1. plumbing 2. plumbing fixture 3. blow out toilet 4. composting toilets 5. flushless toilet 6. flushometer 7. flush type urinals 8. high efficiency toilets 9. high efficiency urinals 10. non water using type urinals 11. showerhead 12. spa 13. tap 14. toilet 15. ultra low flush toilets 16. urinal 17. flush 18. flapper valve 19. ultra low flush (ULF) toilet rebates or retrofits 20. unified north american requirements 21. urinal rebates or retrofits Publications 1. BMP 6_Fact Sheet Faucets WORKING PAPER 2. City of Temple Terrace Water Conservation Program Plumbing Retrofit Project 3. Domestic Water Conservation Technologies 4. High Efficiency Plumbing Fixtures Toilets and Urinals 5. Residential Indoor Water Conservation Study: Evaluation of high efficiency indoo r plumbing fixture retrofits in Single family homes in the East Bay municipal utility district service area Low Flow Plumbing Fixtures (website) 6. A Primer on HETs 7. BMP 5_Water Use by Urinals WORKING PAPER 8. BMP 7 Showerheads WORKING PAPER 9. Dual Flush Toile t Project 10 1 2 Great Toilet Rebate Program (3 versions) 13. High Efficiency Toilets (HETs) 14. High Efficiency Toilets (HETs) USA and Canada 15 19 Hillsborough County Ultra Low Flow Rebate Program (5 ) 20. Testing a new and better measurement of toilet performance 21. Maximum Performance (MaP) Testing of Popular Toilet Models 22. Pinellas County Utilities Water Conservation Opportunities Study 23. Single Family Residential Toilet Rebate Pr ogram Evaluation 24. St. Petersburg Ultra Low Flow Toilet and Water Use Evaluation Rebate Project 25. The Real High Efficiency Toilets Have Arrived 26. Toilets 27. Urinals 28. WaterWiser Glossary of Common Water Terms 29. Water Words Dictionary 30. Applicant's Handbook: Consumpt ive Uses of Water. Chapter 40C 2, F.A.C. 31. BMP Costs & Savings Study, A Guide to Data and Methods for Cost Effectiveness Analysis of Urban Water Conservation Best Management Practices 32. City of Dunedin Water Saver Kit Retrofit Program 33. City of St. Petersburg To ilet Replacement Program 34. Memorandum of Understanding Regarding Urban Water Conservation in California 35. Residential Ultra Low Flush Toilet Replacement Program 36. RetroFit 37. ULF Toilet Marketing and Implementations Strategies Program Final Report

PAGE 77

77 Table 5 2 Google Site Search results 1. City of Temple Terrace Water Conservation Program Plumbing Retrofit Project (p) 2. Low Flow Plumbing Fixtures (w) 3. High Efficiency Plumbing Fixtures Toilets and Urinals (p) 4. plumbing (k) 5. City of St. Petersbu rg Toilet Replacement Program (p) 6. BMP 5_Water Use by Urinals WORKING PAPER (p) 7. Residential Ultra Low Flush Toilet Replacement Program (p) 8. Toilets (p) 9. Cisterns in the State of Florida (p) 10. ULF Toilet Marketing and Implementations Strategies Program Final Report (p) 11. Dual flush Toilet Project (p) 12. Water Conservation Tips (p) 13. toilet (k) 14. Sustainable Urban Water Infrastructure Systems (p) 15. Benchmarks Used in Conservation Planning Appendix B (p) 16. Measure 4: Swimming Pool Water Use Analysis by Observed Data and Long term Continuous Simulation (p) 17. University of Florida & Conserve Florida EZ Guide User Manual with Case Study Example Version 1.0 (p) 18. Summary of State and National Research Priorities Related to Water Conservation and Water Use Efficiency 2009 (p) 19. BM P Costs & Savings Study, A Guide to Data and Methods for Cost Effectiveness Analysis of Urban Water Conservation Best Management Practices (p) 20. RetroFit (p) 21. High Efficiency Toilets (HETs) (p) 22. St. Petersburg Ultra Low Flow Toilet and Water Use Evaluation Reb ate Project (p) 23. Hillsborough County Ultra Low Flow Rebate Program (p) 24. Quarterly Presentation July 16, 2007 (p) 25. High Efficiency Toilets (HETs) USA and Canada (p) 26. Hillsborough County Ultra Low Flow Rebate Program (p) 27. Analysis of residential demand of water i n the St. Johns River Water Management District (p) 28. Water Supply, Developing Sustainable Water Supplies to Meet Current and Future Demands (p) 29. Great Toilet Rebate Program (p) 30. Publication (t) 31. Water Conservation & Water Consumption in Spring Hill, Fl (p) 32. Promote WaterSense: What Are You Waiting For? (p) 33. Florida's Public Supply Water Conservation Performance Measurement System, Appendix A, B, C, & D (p) 34. Heart of Florida Regional Water Conservation Workshop (p) 35. Hillsborough County Ultra Low Flow Rebate Progr am (p) 36. Water Use Permit Information Manual Part B Basis of Review (p) 37. Applications and Practices to Maximize Conservation of Florida's Water Resources, Indoor and Water Features Water Use (p) 38. Great Toilet Rebate Program (p) 39. Potential Water Savings of Conservation Techniques (p) 40. Long Term Demand Forecasting Model, Development of Uniform Billing Methodologies (p) 41. BMP 8_Florida Water Loss Methodologies WORKING PAPER (p) 42. Potential Water Savings of Conservation Techniques (p)

PAGE 78

78 43. Retrofit Programs & Reuse Proje cts & Outdoor Water Conservation Efforts: Summary Report (p) 44. Investigation of areas where domestic self supply wells are sensitive to water level decline (p) 45. Florida's Public Supply Water Conservation Performance Measurement System (p) 46. Standards for Landsc ape Irrigation in Florida (p) 47. Applicant's Handbook: Consumptive Uses of Water. Chapter 40C 2, F.A.C. (p) 48. Pinellas County Utilities Water Conservation Opportunities Study (p) 49. Commercial and Institutional end Uses of Water (p) 50. Southwest Florida Water Managem ent District (o) 51. U.S. Department of Energy (o) 52. aggressive water (k) 53. Domestic Water Conservation Technologies (p) 54. Report (t) 55. Residential Indoor Water Conservation Study: Evaluation of high efficiency indoor plumbing fixture retrofits in Single family homes in the East Bay municipal utility district service area (p) 56. John Koeller (person) 57. Keywords p (i) 58. urinal (k) 59. ultra low flush (ULF) toilet rebates or retrofits (k) 60. leak detection and repair (k) 61. urinal rebates or retrofits (k) 62. water conservation workshop (k) 63. backflow (k) 64. Website (t) 65. Keywords (i) 66. Conserve Florida Water Clearinghouse Research Agenda 2008 (p) 67. Evaluation of indoor urban water use and water loss management as conservation options in Florida (p) (p) publication, (k) keyword, (w) websit e, (o) organization, (t) resource type, (i) keyword index Table 5 3. Ten most frequent concepts extracted from two corpora. documents corpus web corpus 1 water water 2 soil soil 3 irrigation plants 4 Florida irrigation 5 trees nitrogen 6 plants management 7 crops citrus 8 area crops 9 citrus information 10 management fertilizers

PAGE 79

79 Table 5 4 Concepts similar to citrus obtained using 2 corpora and 2 measures Corpus Similarity measure Most similar concepts to citrus Average NAL distance documents matching c oefficient florida, water, research, management, soil, methods, time, area, trees, surfaces 7.3 documents cosine groves, trees, oranges, lakes, florida, fruits, rootstocks, management, research, water 6.1 web matching c oefficient fruits, oranges, florida, grapefruits, trees, citrus paradisi, aurantium, plants, mandarins, citrus fruits 3.1 web cosine citrus paradisi, citrus fruits, citrus sinensis, citrus aurantium, citrus reticulata, oranges, grapefruits, aura ntium, mandarins, citrus peels 1.3 Table 5 5 Concepts similar to nitrogen obtained using 2 corpora and 2 measures Corpus Similarity measure Most similar concepts to citrus Average NAL distance documents matching c oefficient water, methods, research, florida, soil, time, materials, nutrients, fertilizers, fields 6.0 documents cosine nutrients, fertilizers, methods, materials, samples, soil, water, nitrates, concentration, research 5.2 web matching c oefficient fertilizers, soil, nitrates, nitrogen fertilizers, uptake, water, plants, leaching, phosphorus, management 4.8 web cosine nitrogen fertilizers, nitrogen mineralization, nitrogen use efficiency, uptake, ammonium nitrogen, nitrates, fertilizers, soil solution, mineralization, nitrogen content

PAGE 80

80 Table 5 6 Concepts similar to irrigation obtained using 2 corpora and 2 measures Corpus Similarity measure Most similar concepts to citrus Average NAL distance documents matching c oefficient water, florida, soil, research, methods, management, time, crops, fields, surfaces 4.6 documents cosine water, soil, florida, crops, methods, management, research, rainfall, area, time 4.7 web matching c oefficient water, irrigation systems, drip irrigation, sprinklers, soil, management, crops, seepage, irrigation water, irrigation scheduling 2.7 web cosine irrigation systems, drip irrigation, sprinklers, irrigation water, irrigation scheduling, seepage, irrigation management, deficit irrigation, trickle irrigation, overhead irrigation 2.1 Figure 5 1. Fragment of the top level part of the WCDL ontology

PAGE 81

81 Figure 5 2. Fragment of the domain specific part of the WCDL ontology Figure 5 3. Web page corresponding to the ontology concept shown in Figure 5 2.

PAGE 82

82 Fig ure 5 4 Publication individual with relationships Figure 5 5. Web page corresponding to publication individual shown in Figure 5 4.

PAGE 83

83 Figure 5 6. Fragment of the WCDL ontology in ObjectEditor

PAGE 84

84 Figure 5 7. Fragment of the simulation ontology in LyraBrows er Figure 5 Advanced Guidelines for Preparing Water Conservation Plans ROOT (NP (NP (NNP Advanced)) (NP (NP (NNS Guidelines)) (PP (IN for) (NP (NNP Preparing) (NNP Water) (NNP Conservation) (NNPS Plans))))))

PAGE 85

85 Figure 5 Herbicide Transport in a Restored Riparian Forest Buffer System Figure 5 10. Summary of concept extraction using Stanford Parser. (ROOT (NP (NP (NNP Herbicide) (NNP Transport)) (PP (IN in) (NP (DT a) (NNP Restored) (NNP Riparian) (NNP Forest) (NNP Buffer) (NNP System)))))

PAGE 86

86 Figure 5 11. Zipf's law for Reuters C orpus V olume 1 (source: Manning et al. 2008) Figure 5 12. Frequency distribution of concepts in the BMP Publication Library

PAGE 87

87 Figure 5 13. Fragment of NAL Thesaurus. Figure 5 14. Different similarity measures and corpora against NAL distance. A) matching coefficient/documents, B) cosi ne/documents, C) matching coefficient/web, D) cosine/web. citrus citrus fruits grapefruits tree fruits fruit trees trees RT RT RT NT BT NT BT A B C D

PAGE 88

88 CHAPTER 6 CONCLUSIONS AND FUTURE WORK Conclusions This work explored a number of techniques developed in the fields of information retrieval, knowledge organization, information extraction and computational linguistics and applied them to the agriculture and natural resources domain. An o n tology KOS was used to address the i nformation r etrieval problem in the Water Conservation Digital Library. The implementation of the approach has been available for public use for over three years. During this time it demonstrated numerous benefits of the use of the ontology. It allows the information to be easily presented in many different ways. The ontology when properly constructed, makes finding resources of interest easier and more effective compared to a traditional text search Furthermore, the flexibility and extensibility of output formats an d standards allow for information sharing and reuse. And most importantly, the ontology proved to be an effective aid in organization and management of a growing collection of various types of digital resources. On the other hand, some drawbacks of the app roach were observed. First t he initial workload associated with developing an ontology of adequate complexity was significant and could be discouraging for certain applications. Furthermore there we re difficulties with keeping the ontology consistent and with accurate manual cataloging of resources, especially when the number of concepts increased above certain level Some of t he se issues were caused by the limitations of the authoring tool but most of the problems called for an automation of the process of creation and editing of the ontology A semi automatic approach was adapted that suggests the most relevant ontology concepts and significantly limits the amount of information that needs to be inspected, but the

PAGE 89

89 final editing decision belongs to a hum an. A combination of various IE and CL techniques, such as stop words, stemming, probabilistic parsing, VSM with various statistical measures, such as matching c oefficient, cosine or TF IDF were explored Additionally, external resources, such as NAL Thesa urus and a domain specific corpus generated with the use of an Internet search engine were incorporated in the methodology. It was found that s ome of the examined techniques, such as probabilistic parsing performed with the Stanford Parser, need further im provements to be useful for IE purposes of this work T he parser was unable to i dentify many important concepts, while most of the concepts extracted using the parser, do not occur in the abstract of the document they were extracted from. These deficiencie s are possibly caused by the standard grammar that was used for parsing and could be remedied by providing a domain specific grammar that reflects the characteristics of texts in the collection. Another approach was adapted that extracts concepts using N gram model with lexicon (NAL Thesaurus was used for this purpose) and uses VSM to provide a measure of semantic similarity for any two concepts. Two corpora: original documents in the collection and snippets obtained from an Internet search engine, in comb ination with two statistical measures: matching c oefficient and cosine, were used to estimate the semantic similarity. Results for each method were compared against the distance between the concepts in the NAL Thesaurus. Cosine based on the web search snip pets corpus yielded results most strongly related to NAL distance ( R 2 =0.97) which indicates that this method provides the most valuable information. A new web based interface was developed that allows for accessing and authoring the information in the Wat er Conservation Digital Library and BMP Document Library.

PAGE 90

90 Both web sites are available online and actively used by the public since 2008 and 2010 respectively. During this period feedback was received from a number of users. Many reported a substantial learning period to become familiarize d with the system especially with browsing the ontology concepts, using more general, specific and related concepts. There were requests to make the search box more distinct in the web interface as this method of IR is perceived as simpler and more uniform with other web sites that users are familiar with. Most users prefer to type a search query related to their information need and obtain results for two reasons. First they are probably alrea dy familiar with conventional search engines and second it allows them to express the information need in their own words These observations imply that KOSes, such as ontologies, need to be incorporated in the IR process in a way that is not discouraging for the users. The o ntology assisted search methodology developed within this work reflects this kind of approach where an ontology complements a traditional text search A comparison of the results of several IR approaches, presented in the previous chap ter, showed that many more relevant documents could be found by incorporating semantics of the ontology in the retrieval process Future W ork The Stanford Parser is a probabilistic parser and its performance depends on the grammar it is supplied. The pack age includes an English grammar based on Penn Treebank corpus consisting of over 4.5 million words that w ere annotated for part of speech and syntactic information The corpus is based on general English and does not reflect specifics of the agricultural d omain. In order to improve performance of the parser a domain specific grammar needs to be developed Th e steps involve selecting a number of texts representative for the agricultural domain and manual annotation of

PAGE 91

91 these text s for part of speech and syntactic information according to the Penn Treebank annotation format Another direction in future work is more formal evaluation of the results of s tatistical methods of IE and CL and comparison against a traditional text search s uch as Google. As the relevance of a particular query (or concept in case of ontology based retrieval) to a particular document there is no fully automatic way to perform this kind of evaluation. For this purp ose a survey needs to be conducted involving multiple domain experts. Each expert would be provided a number of documents each with a number of concepts discovered using various IE techniques. The relevance of each concept to the document would then be ass essed by each expert on a certain scale. The experts could also suggest additional concepts that they find more relevant. The documents need to be chosen in a way that covers a number of top results from example text search es be ing evaluated. Finall y, a selection of techniques that are confirmed to produce the best results in the evaluation need to be incorporated in an ontology authoring tool. An existing tool, such as LyraBrowser can be adapted for this purpose or a new tool can be developed. The t ool would facilitate ontology authoring by providing the most similar ontology concepts when adding a new one, and a list of most relevant concepts for a give n text document.

PAGE 92

92 APPENDIX A TITLE PARSING RESULT S WATER CONSERVATION D IGITAL L IBRARY This append ix contains a list of two word noun phrases extracted from publication titles from the Water Conservation Digital Library. The noun phrases were obtained with a Java application developed using libraries available in the Stanford Parser package with Englis h PCFG grammar. The list below contains a title of each publication followed by the noun phrases extracted from this title. Advanced Guidelines for Preparing Water Conservation Plans Preparing Plans Water Plans Conservation Plans Affordability A nalysis of Alternative Water Supply Affordability Analysis Alternative Supply Water Supply Agricultural Thesaurus and Glossary Agricultural Thesaurus Agricultural Water Use Model Agricultural Model Water Model Use Model A Guide to Florida Friendly Landscaping Florida Friendly Landscaping A Guide to Micro Irrigation for West Central Florida Landscapes. How to save water through proper planning, operation and maintenance. West Central Landscapes Florida Landscapes A Guide to Understanding and Protecting the Southern Coastal Watershed Southern Watershed Coastal Watershed Analysis of residential demand of water in the St. Johns River Water Management District St. District Johns District River District Water D istrict Management District Analysis of Water Conservation Measures for Public Supply

PAGE 93

93 Water Conservation Public Supply Annual Maintenance and Evaluation of Overhead Irrigation Systems Annual Maintenance Overhead Systems Irrigation System s Annual water use data: 2001 water data use data Annual water use data: 2002 water data use data Annual water use data: 2003 water data use data Annual water use data: 2004 water data use data Annual water use data: 2005 water data use data Annual water use survey: 1978 water survey use survey Annual water use survey: 1984 water survey use survey Annual water use survey: 1987 water survey use survey Annual water use survey: 1988 water survey u se survey Annual water use survey: 1989 water survey use survey Annual water use survey: 1990 water survey use survey Annual water use survey: 1991 water survey use survey Annual water use survey: 1992 water survey use survey Annual water use survey: 1993 water survey

PAGE 94

94 use survey Annual water use survey: 1994 water survey use survey Annual water use survey: 1995 water survey use survey Annual water use survey: 1996 water survey use survey Applicable rules and r egulations for seawater demineralization: Task B.6 for the seawater demineralization feasibility investigation seawater demineralization Task B. seawater investigation demineralization investigation feasibility investigation Applicant's Hand book: Consumptive Uses of Water. Chapter 40C 2, F.A.C. Consumptive Uses Water Chapter Applications and Practices to Maximize Conservation of Florida's Water Resources, Indoor and Water Features Water Use Maximize Conservation Water Resources Water Use Features Use Water Use A Primer on HETs Aquifer storage and recovery issues and concepts recovery issues Basic Guidelines for Preparing Water Conservation Plans Preparing Plans Water Plans Conservation Plans Benchmarks Used in Conservation Planning Appendix B Conservation B Planning B Appendix B BMP Costs & Savings Study, A Guide to Data and Methods for Cost Effectiveness Analysis of Urban Water Conservation Best Management Practices Costs Study Urban Practi ces Water Practices Conservation Practices

PAGE 95

95 Best Practices Management Practices Capturing Water with Rain Barrels Rain Barrels Causes and Prevention of Emitter Plugging In Microirrigation Systems Microirrigation Systems Central Florida aquifer recharge enhancement program: phase 1, artificial recharge well demonstration project Central program Florida program enhancement program recharge project well project demonstration project Central Florida Artificial Recharge Demo nstration Program: alternative water supply strategies in the St. Johns River Water Management District Central Recharge Florida Recharge Artificial Recharge Demonstration Program water strategies supply strategies St. District John s District River District Water District Management District Cisterns in the State of Florida Cisterns To Collect Non Potable Water For Domestic Use Non Potable Water City of Dunedin Water Saver Kit Retrofit Program Dunedin Program Wate r Program Saver Program Kit Program Retrofit Program City of St. Petersburg Toilet Replacement Program St. Program Petersburg Program Toilet Program Replacement Program City of St. Petersburg Toilet Replacement Program St. Program Petersburg Program Toilet Program

PAGE 96

96 Replacement Program City of Tampa Landscape Water Audit Tampa Audit Landscape Audit Water Audit City of Temple Terrace Water Conservation Program Plumbing Retrofit Project Temple Program Terrace P rogram Water Program Conservation Program Plumbing Project Retrofit Project City of Winter Haven Water Conservation Program Winter Program Haven Program Water Program Conservation Program Commercial and Institutional end Uses of Wa ter end Uses Comparative review of use of wetland constraints in the water supply planning process water process supply process Computer simulation of the predevelopment and current Floridan Aquifer system in Northeast Florida Floridan system Aquifer system Northeast Florida Conserve Florida Water Clearinghouse Draft Long Term Plan Conserve Plan Florida Plan Water Plan Clearinghouse Plan Draft Plan Long Term Plan Conserve Florida Water Clearinghouse Long Term Plan Overvi ew Conserve Overview Florida Overview Water Overview Clearinghouse Overview Long Term Overview Plan Overview Conserve Florida Water Clearinghouse Overview Conserve Overview Florida Overview Water Overview

PAGE 97

97 Clearinghouse Overvie w Conserve Florida Water Clearinghouse Research Agenda Conserve Agenda Florida Agenda Water Agenda Clearinghouse Agenda Research Agenda Conserve Florida Water Conservation Guide Conserve Guide Florida Guide Water Guide Conservation Guide Conserve Florida Water The Guide Conserve Guide Florida Guide Water Guide The Guide Conserving Water in the Home Landscape Home Landscape Consideration of Cisterns as a Water Conservation Measure Water Measure Co nservation Measure Cost Estimating and Economic Criteria for 2005 District Water Supply Plan Estimating Criteria District Plan Water Plan Supply Plan Criteria for preliminary screening of areas for potential seawater demineralization facilities task C.1. for the seawater demineralization feasibility investigation seawater facilities demineralization facilities seawater investigation demineralization investigation feasibility investigation Crystal Lakes Manors A Case Study Crystal Manors Lakes Manors Case Study Daily Water Use at Home Daily Use Water Use Demineralization concentrate database task B.2 and GIS data layers task B.3 for the investigation of demineralization concentrate management project D emineralization database

PAGE 98

98 concentrate database GIS layers data layers demineralization project concentrate project management project Demineralization concentrate ocean outfall feasibility study: evaluation of additional information needs Demineralization ocean concentrate ocean feasibility study information needs Developing Research and Outreach Plans Research Plans District Permitting District Permitting District Water Supply Plan 2005 District 2005 Water 2005 Supply 2005 Plan 2005 District Water Supply Plan Appendixes District Appendixes Water Appendixes Supply Appendixes Plan Appendixes Domestic Water Conservation Technologies Domestic Technologies Water Technologies Conservation Te chnologies Drought Tolerant Plants for North and Central Florida Drought Plants Tolerant Plants North Florida Dual flush Toilet Project, Canada Mortgage and Housing Corporation Dual flush Project Toilet Project Canada Mortgage Housing Corporation East Central Florida Water Supply Initiative St. Johns River Water Supply Project Surface Water Treatment Plant Siting Study Level 3 Analysis: Detailed Site Specific Screening East Analysis Central Analysis Florida Analysis Water Analysis

PAGE 99

99 Supply Analysis Initiative Analysis St. Analysis Johns Analysis River Analysis Water Analysis Supply Analysis Project Analysis Surface Analysis Water Analysis Treatment Analysis Plant Analysis Siting Analysis Study Analysis Level Analysis Detailed Screening Site Specific Screening East Central Florida water supply planning initiative: East Central Florida water agenda: a report on the water supply planning initiative process East Florida Cen tral Florida water initiative supply initiative planning initiative East Central agenda Florida agenda water agenda water process supply process planning process initiative process East Central Florida water supply planning initiative: final report East Central supply Florida supply water supply planning initiative East central Florida water supply planning initiative: phase II: annual report of activities and accomplishments, 2003 East central Florida water initiative supply initiative planning initiative East Central Florida water supply planning initiative phase II: annual report of activities and accomplishments East Central Florida

PAGE 100

100 water phase supply phase planning phase initiative phase Efficiencies of Irrigation Systems Used in Florida Nurseries Irrigation Systems Florida Nurseries Energy Down the Drain Estimates of upper Floridan Aquifer recharge augmentation based on hydraulic and water quality data (1986 2002) from the Wa ter Conserv II RIB systems Floridan augmentation Aquifer augmentation Water systems Conserv systems II systems RIB systems Evaluating Implementation of Multiple Irrigation and Landscape Ordinances in the Tampa Bay Region Landscape Ordinances Tampa Region Bay Region Evaluating Irrigation Pumping Systems Evaluating Systems Irrigation Systems Pumping Systems Evaluating Urban Water Conservation Programs: A Procedures Manual Evaluating Programs Urban P rograms Water Programs Conservation Programs Procedures Manual Evaluation and Demonstration of Evapotranspiration Based Irrigation Controllers Evapotranspiration Based Controllers Irrigation Controllers Evaluation of Davis Islands landscape irrigation system conservation program Davis Islands landscape program irrigation program system program conservation program Evaluation of Evapotranspiration and Soil Moisture based Irrigation Control on Turfgrass Moisture based Contr ol

PAGE 101

101 Irrigation Control Evaluation of Urban Water Conservation in Florida Urban Conservation Water Conservation Evaluation of wetland and lake constraint sites in Lake, Orange, Osceola, Seminole and Volusia Counties wetland sites constraint sites Lake Counties Evaporation Loss During Sprinkler Irrigation Evaporation Loss Sprinkler Irrigation FDEP, 2005 Fertilizer Facts Fertilizer Facts Final report: decision modeling for alternative water supply strategies decisi on modeling water strategies supply strategies Final report: development of a population based water use model water model use model Final Report: Phase I: Implementation of Water Conservation Rate Structures Final Report Water Structures Conservation Structures Rate Structures Final report on five potential seawater demineralization project sites Task C.5 for the seawater demineralization feasibility investigation demineralization sites project sites Task C. seawater i nvestigation demineralization investigation feasibility investigation Florida's Public Supply Water Conservation Performance Measurement System Public System Supply System Water System Conservation System Performance System Measurement System

PAGE 102

102 Florida's Public Supply Water Conservation Performance Measurement System, Appendix A, B, C, & D Public System Supply System Water System Conservation System Performance System Measurement System Appendix A Flori da water conservation initiative water initiative conservation initiative Florida Water Rates Evaluation of Single family Homes Florida Rates Water Rates Single family Homes Florida Waters Florida Waters Flushing Procedures for Microirr igation Systems Microirrigation Systems Great Toilet Rebate Program Great Program Toilet Program Rebate Program Great Toilet Rebate Program Great Program Toilet Program Rebate Program Great Toilet Rebate Program Great Program Toilet Program Rebate Program GUIDE Software updates GUIDE updates Software updates Handbook of Water Use and Conservation Water Use Hazardous Chemicals Hazardous Chemicals Heart of Florida Regional Water Conservation Workshop Florid a Workshop Regional Workshop Water Workshop Conservation Workshop High Efficiency Plumbing Fixtures Toilets and Urinals High Efficiency Fixtures Plumbing Fixtures

PAGE 103

103 High Efficiency Toilets (HETs) High Toilets Efficiency Toilets High Efficiency Toilets (HETs) USA and Canada High Toilets Efficiency Toilets Hillsborough County Ultra Low Flow Rebate Program Hillsborough Program County Program Ultra Program Low Program Flow Program Rebate Program Hillsborough County Ultra Low Flow Rebate Program Hillsborough Program County Program Ultra Program Low Program Flow Program Rebate Program Hillsborough County Ultra Low Flow Rebate Program Hillsborough Program County Program Ultra Program Low Program Flow Program Rebate Program Hillsborough County Ultra Low Flow Rebate Program Hillsborough Program County Program Ultra Program Low Program Flow Program Rebate Program Hillsborough County Ultra Low Flow Rebate Program Hillsborough Program County Program Ultra Program Low Program Flow Program Rebate Program Hillsborough County Ultra Low Flow Rebate Program Phase 3 Hillsborough Rebate County Rebat e Ultra Rebate Low Rebate Flow Rebate

PAGE 104

104 Program Phase Home Irrigation and Landscape Combinations for Water Conservation in Florida Home Irrigation Landscape Combinations Water Conservation How to Calibrate Your Sprinkler System Sprin kler System Identification of favorable sites for feasible seawater demineralization: Task C.4 for the seawater demineralization feasibility investigation seawater demineralization Task C. seawater investigation demineralization investigation feasibility investigation Intermediate Guidelines for Preparing Water Conservation Plans Preparing Plans Water Plans Conservation Plans Investigation of areas where domestic self supply wells are sensitive to water level decline level decli ne Investigation of demineralization concentrate management Irrigating With High Salinity Water High Water Salinity Water Irrigation of Lawns and Gardens Irrigation System Maintenance Irrigation Maintenance System Maintenance Landscape Desig n for Water Conservation Landscape Design Water Conservation Lawn Sprinkler Selection and Layout for Uniform Water Application Lawn Sprinkler Uniform Application Water Application LEED? for Commercial Interiors Commercial Interiors LEED ? for Core & Shell Development Core Development LEED? for Existing Buildings: Operations & Maintenance Rating System (PDF) Operations System Rating System

PAGE 105

105 LEED? for Homes Rating System Homes System Rating System LEED? for Neighborhood Development Neighborhood Development LEED? for New Construction New Construction LEED? for Retail for New Construction and Major Renovations Pilot New Construction Major Renovations LEED? for Schools for New Constructi on and Major Renovations New Construction Major Renovations Let Your Lawn Tell You When To Water Living in Florida's Watersheds Long Term Demand Forecasting Model, Development of Uniform Billing Methodologies Long Term Forecasting Demand Fore casting Model Development Uniform Methodologies Billing Methodologies Long term Plan Conserve Florida Water Clearinghouse Plan Clearinghouse Conserve Clearinghouse Florida Clearinghouse Water Clearinghouse Low Impact Development Design Strategies An Integrated Design Approach Low Impact Strategies Development Strategies Design Strategies Integrated Approach Design Approach Managing Your Florida Lawn Under Drought Conditions Florida Lawn Drought Conditions Maximum Performance (MaP?) Testing a new and better measurement of toilet performance Maximum Testing Performance Testing toilet performance Maximum Performance (MaP) Testing of Popular Toilet Models Maximum Testing Performance Testing Toilet Models

PAGE 106

106 Maximum Performance (MaP) Testing of Popular Toilet Models Maximum Testing Performance Testing Toilet Models Memorandum of Understanding Regarding Urban Water Conservation in California Urban Conservation Water Conservation Microirrigation in The Landscape Middle St. Johns River minimum flow levels hydrologic methods report Middle River St. River Johns River flow levels methods report Minutes February 19, 2008 Minutes September 18, 2007 Model Native Plant Landscape Ordinance Handbook Model Handbook Native Handbook Plant Handbook Landscape Handbook Ordinance Handbook Model Water Efficient Irrigation and Landscape Ordinance. Model Water Efficient Irrigation Landsca pe Ordinance MODULE CBC Learning Resources Catalog Crop Biosecurity and the NPDN MODULE Catalog CBC Catalog Learning Catalog Resources Catalog Crop Biosecurity Monthly Report February 19, 2008 Monthly Report November 13, 2007 National Perfor mance Review. Serving the American Public: Best Practices in Performance Measurement: Benchmarking Study Report National Serving Performance Serving Review Serving American Public Best Practices Performance Measurement Benchmarking Report Study Report

PAGE 107

107 Ocklawaha River Water Allocation Study Ocklawaha Study River Study Water Study Allocation Study Operation of Residential Irrigation Controllers Residential Controllers Irrigation Controllers Perform ance Standards for demonstrating Urban Water Conservation, a Briefing Book Performance Standards Urban Conservation Water Conservation Briefing Book Pinellas County Utilities Water Conservation Opportunities Study County Water Utilities Water Conservation Study Opportunities Study Population and water usage projection: technical memorandum Population projection usage projection Potential Water Savings of Conservation Techniques Water Savings Conservation Techniques Potential Water Savings of Conservation Techniques Water Savings Conservation Techniques Predicting areas of future public water supply problems: a geographic information system approach water problems supply problems system approach Prelim inary Investigation of Supplementing the City of Apopka Reuse System with Water Withdrawn from Lake Apopka Preliminary Investigation Apopka System Reuse System Water Withdrawn Lake Apopka Projected aquifer drawdowns Palm Coast Utility Corpo ration wellfields aquifer drawdowns Palm Corporation Coast Corporation Utility Corporation Quarterly Presentation July 16, 2007

PAGE 108

108 Realizing the Benefits from Water Conservation Water Conservation Regional characterization and assessment of t he potential for saltwater intrusion in Northeast Florida and Camden County, Georgia, using the sharp interface approach Regional characterization saltwater intrusion Northeast Florida Camden County Regional simulation of projected groundwater withdrawals from the Floridian aquifer system in western Volusia County and southeastern Putnam County, Florida Regional simulation groundwater withdrawals Floridian system aquifer system Volusia County Putnam Florida County Florida Required Guide Data Inputs Required Inputs Guide Inputs Data Inputs Residential Indoor Water Conservation Residential Conservation Indoor Conservation Water Conservation Residential Irrigation Based on Soil Moisture Residential Irri gation Soil Moisture Residential Irrigation Water Use in Central Florida Residential Irrigation Water Use Central Florida Residential Ultra Low Flush Toilet Replacement Program Residential Program Ultra Low Flush Program Toilet Progr am Replacement Program RetroFit Retrofit Programs & Reuse Projects & Outdoor Water Conservation Efforts: Summary Report Retrofit Programs Reuse Projects Outdoor Efforts Water Efforts Conservation Efforts

PAGE 109

109 Summary Report Right Plant, Right Place Right Plant Right Place Rules of the Southwest Florida Water Management District Chapter 40D 21 Water Shortage Plan Southwest Plan Florida Plan Water Plan Management Plan District Plan Chapter Plan 40D 21 Plan Wate r Plan Shortage Plan Save Water Indoors Save Indoors Water Indoors Saving Water Indoors Saving Indoors Water Indoors Saving Water Outdoors Saving Outdoors Water Outdoors Saving Water Using Your Irrigation System Irrigation System Seawater demineralization concentrate characterization Seawater demineralization Sensor Based Automation of Irrigation of Bermudagrass Sensor Based Automation Simplified Guide Prototype Simplified Prototype Guide Prototype Simulation of the effects of groundwater withdrawals on the Floridan aquifer system in East central Florida: model expansion and revision groundwater withdrawals Floridan system aquifer system East central Florida model expansion Single Family Residential Toi let Rebate Program Evaluation Residential Evaluation Toilet Evaluation Rebate Evaluation Program Evaluation

PAGE 110

110 Small Acreage Farm & Ranch Best Management Practices for Protecting Florida's Water Small Farm Acreage Farm Best Practices Management Practices Protecting Florida Sprinkler Irrigation and Soil moisture Uniformity Sprinkler Irrigation Soil Uniformity moisture Uniformity St. Johns River water supply project: literature review of surface water treatment technologies St. River Johns River water project supply project literature review surface technologies water technologies treatment technologies St. Johns River water supply project: surface water treatability and demineralization study: prelimi nary raw water characterization St. River Johns River water project supply project surface treatability water treatability demineralization study water characterization St. Petersburg Ultra Low Flow Toilet and Water Use Evaluation R ebate Project St. Ultra Petersburg Ultra Low Toilet Flow Toilet Water Project Use Project Evaluation Project Rebate Project Standards and Specifications for Turf and Landscape Irrigation Systems Landscape Systems Irrigation Systems Standards for Landscape Irrigation in Florida

PAGE 111

111 Landscape Irrigation Stormwater Management System Stormwater System Management System Surface water treatability and demineralization study water treatability demineralizat ion study Tampa Bay Water Drought Mitigation Plan Tampa Plan Bay Plan Water Plan Drought Plan Mitigation Plan Technical memorandum B.5: applicable rules and regulations for concentrate management: Task B.5: applicable rules and regulations: investigation of demineralization concentrate management concentrate management Task B. demineralization management concentrate management Technical memorandum B.7: demineralization treatment technologies for the seawater demineralization feas ibility investigation Technical B. memorandum B. demineralization technologies treatment technologies seawater investigation demineralization investigation feasibility investigation Technical memorandum C1: East Central Florida water supply initiative: St. Johns River water supply project : surface water treatment plant sitting study: public involvement/information plan memorandum C1 East initiative Central initiative Florida initiative water initiative supply initiat ive St. project Johns project River project water project supply project surface plant

PAGE 112

112 water plant treatment plant involvement \ /information plan Ten Ways to Conserve Water Conserve Water Terms of Environment: Glossary, Abbrevi ations and Acronyms The population projection methodology of the St. Johns River Water Management District's 2003 District water supply assessment and 2005 District water supply plan population methodology projection methodology St. District J ohns District River District Water District Management District District water assessment plan District plan water plan supply plan The Practice of Low Impact Development Low Development Impact Development The Real High Efficiency Toilets Have Arrived Real Toilets High Efficiency Toilets The Role of Conservation in Urban Water Supply & Demand Planning Urban Supply Water Supply Demand Planning The Tampa Bay Water Long Term Demand Forecasting Model T ampa Forecasting Bay Forecasting Water Forecasting Long Term Forecasting Demand Forecasting Toilets Turf and Landscape Irrigation Best Management Practices Landscape Practices Irrigation Practices Best Practices Management Practices ULF Toilet Marketing and Implementations Strategies Program Final Report ULF Marketing Toilet Marketing

PAGE 113

113 Implementations Report Strategies Report Program Report Final Report Understanding the Concepts of Uniformity and Efficiency in Irri gation Urinals USEPA Water Conservation Plan Guidelines (1998), Level 1 Measure USEPA Plan Water Plan Conservation Plan USEPA Water Conservation Plan Guidelines Acronyms and Glossary Appendix C USEPA Acronyms Water Acronyms Conservati on Acronyms Plan Acronyms Guidelines Acronyms Appendix C USEPA Water Conservation Plan Guidelines Information Resources Appendix D USEPA Resources Water Resources Conservation Resources Plan Resources Guidelines Resources Infor mation Resources Appendix D USGS Water Science Glossary of Terms Water Glossary Science Glossary Using the Irrigation Controller for a Better Lawn on Less Water Irrigation Controller Better Lawn Less Water Volusian Water Alliance governance study: options and recommendations Volusian study Water study Alliance study governance study Waste Not, Want Not: The Potential for Urban Water Conservation in California Urban Conservation Water Conservation Water, Water, Eve rywhere?

PAGE 114

114 Water 2020: water supply planning: summary report on groundwater modeling subgroups for areas I, II, and V water planning supply planning summary report groundwater subgroups modeling subgroups areas I Water 2020 constraints hand book Water audits and leak detection (AWWA manual) Water audits leak detection AWWA manual Water Conservation: Preventing and reducing wasteful, uneconomical, impractical, or unreasonable use of water resources (Section 62 40.412(1), F.A.C.) Water Conservation water resources Water Conservation & Water Consumption in Spring Hill, Fl Water Conservation Water Consumption Spring Hill Water Conservation & Water Consumption in Spring Hill, Fl Water Conservation Water Consumptio n Spring Hill Water Conservation Best Management Practices (BMP) Guide for Agriculture in Texas Water Best Conservation Best Management Guide Practices Guide Water Conservation Digital Library Water Library Conservation Library Digi tal Library Water Conservation Library: Content Organization Water Library Conservation Library Content Organization Water Conservation Library UF Library Collaboration Water Collaboration Conservation Collaboration Library UF Collaboration Library Collaboration Water Conservation Measures Appendix A Water Measures

PAGE 115

115 Conservation Measures Appendix A Water Conservation Potential, A Discussion of Demand Management Benchmarks and Targets Water Conservat ion Management Benchmarks Water Conservation Promoting Rate Structure Computer Model: User Manual Water Model Conservation Model Promoting Model Rate Model Structure Model Computer Model User Manual Water Conservation Strategic Plan Water Plan Conservation Plan Strategic Plan Water Conservation Through Xeriscape Landscaping Demonstration Water Conservation Xeriscape Demonstration Landscaping Demonstration Water resource allocation and quality optimization modeling: final report resource allocation quality modeling optimization modeling Water Supply, Developing Sustainable Water Supplies to Meet Current and Future Demands Water Supply Sustainable Supplies Water Supplies Water supply assessment 1998 St. Johns River Water Management District Water assessment supply assessment St. District Johns District River District Water District Management District Water supply assessment 2003: St. Johns River Water Management District Water 2003 supply 2003 assessment 2003

PAGE 116

116 St. District Johns District River District Water District Management District Water Supply Needs and Sources Assessment: Alternative Water Supply Strategies Investigation: application of planning level cost estimating procedure Water Supply Alternative Water Supply Investigation Strategies Investigation cost procedure estimating procedure Water Supply Needs and Sources Assessment: Alternative Water Supply Strategies Investigation: aquifer storage and recovery utility evaluations Supply Needs Alternative Investigation Water Investigation Supply Investigation Strategies Investigation recovery evaluations utility evaluations Water Supply Needs and Sources Assessmen t: Alternative Water Supply Strategies Investigation: assessment of the cost effectiveness of specific water conservation practices Water Supply Alternative Water Supply Investigation Strategies Investigation cost effectiveness water prac tices conservation practices Water Supply Needs and Sources Assessment: Alternative Water Supply Strategies Investigation: assessment of the cost of supplying reclaimed water to areas of high agricultural withdrawals Water Needs Supply Needs Al ternative Investigation Water Investigation Supply Investigation Strategies Investigation Water Supply Needs and Sources Assessment: Alternative Water Supply Strategies Investigation: A Tool for Assessing the Feasibility of Aquifer Storage Recover y

PAGE 117

117 Water Needs Supply Needs Sources Assessment Alternative Water Supply Investigation Strategies Investigation Aquifer Recovery Storage Recovery Water Supply Needs and Sources Assessment: Alternative Water Supply Strategies Investigation: brackish groundwater : planning level cost estimates Supply Needs Alternative Investigation Water Investigation Supply Investigation Strategies Investigation cost estimates Water Supply Needs and Sources Assessment: Alterna tive Water Supply Strategies Investigation: brackish groundwater: source identification and assessment Water Supply Alternative Investigation Water Investigation Supply Investigation Strategies Investigation source identification Water Su pply Needs and Sources Assessment: Alternative Water Supply Strategies Investigation: brackish groundwater treatment technology assessment Supply Needs Alternative Investigation Water Investigation Supply Investigation Strategies Investigati on groundwater assessment treatment assessment technology assessment Water Supply Needs and Sources Assessment: Alternative Water Supply Strategies Investigation: Effects of Water Use Restrictions on Actual Water Use Water Supply Alternative Water Supply Investigation Strategies Investigation Water Restrictions Use Restrictions Water Use

PAGE 118

118 Water Supply Needs and Sources Assessment: Alternative Water Supply Strategies Investigation: implementation of water conservati on rate structures Water Supply Alternative Water Supply Investigation Strategies Investigation water structures conservation structures rate structures Water Supply Needs and Sources Assessment: Alternative Water Supply Strategies Inv estigation: review of established minimum flows and levels for the Wekiva River system Water Supply Alternative Water Supply Investigation Strategies Investigation Wekiva system River system Water Supply Needs and Sources Assessment: Alte rnative Water Supply Strategies Investigation: Surface Water Availability and Yield Analysis Supply Needs Alternative Investigation Water Investigation Supply Investigation Strategies Investigation Surface Availability Water Availability Yield Analysis Water supply needs and sources assessment: alternative water supply strategies investigation : surface water data acquisition and evaluation methodology supply needs sources assessment water strategies su pply strategies surface acquisition water acquisition data acquisition evaluation methodology Water Supply Needs and Sources Assessment: Alternative Water Supply Strategies Investigation: Surface Water Withdrawal Sites Supply Needs Altern ative Investigation Water Investigation Supply Investigation

PAGE 119

119 Strategies Investigation Surface Sites Water Sites Withdrawal Sites Water Supply Needs and Sources Assessment: Alternative Water Supply Strategies Investigation: Systems Interco nnection Methodology Supply Needs Alternative Investigation Water Investigation Supply Investigation Strategies Investigation Systems Methodology Interconnection Methodology Water Supply Needs and Sources Assessment : Alternative Water Supply Strategies Investigation: Water Supply and Wastewater System Component Cost Information Water Supply Alternative Water Supply Investigation Strategies Investigation Water Supply Wastewater Information System Information Comp onent Information Cost Information Water Supply Needs and Sources Assessment : Alternative Water Supply Strategies Investigation: Wetlands Impact, Mitigation, and Planning level Cost Estimating Procedure Water Supply Alternative Water Supply In vestigation Strategies Investigation Wetlands Impact Planning level Procedure Cost Procedure Estimating Procedure Water supply needs and sources assessment 1994 St. Johns River Water Management District Water needs supply needs St. River Johns River Water District Management District

PAGE 120

120 Water Supply Needs and Sources Assessment Alternative Water Supply Strategies Investigation Replacement of Potable Quality Water for Landscape Irrigation Water Needs Supply Needs Sources Supply Assessment Supply Alternative Supply Water Supply Strategies Replacement Investigation Replacement Potable Water Quality Water Landscape Irrigation Water Use Efficiency Comprehensive Evaluation Water Evaluation Use Evaluation Efficiency Evaluation Comprehensive Evaluation Water use in the St. Johns River Water Management District, 1997 St. District Johns District River District Water District Management District Water use in the St. Johns River Water Management District, 1998 St. District Johns District River District Water District Management District Water use in the St. Johns River Water Management District, 2000 St. District Johns District River District Water District Management District Water Use Permit Information Manual Part B Basis of Review Water Permit Use Permit Information Basis Manual Basis Part Basis B Basis WaterWiser Glossary of Common Water Terms WaterWiser Glossary Water Terms

PAGE 121

121 Water Withdrawals, Use, Discharge, and Trends in Florida, 2000 Water Withdrawals Water Words Dictionary Water Dictionary Words Dictionary West Central Florida's Aquifers West Central Florida

PAGE 122

122 APPENDIX B TITLE PARSING RESULT S BMP PUBLICATION LIBRARY This appendix contains a list of two word noun phrases extracted from publication titles from the BMP Publication Library. The noun phrases were obtained with a Java application developed using libraries avai lable in the Stanford Parser package with English PCFG grammar. Additionally it was checked if, and how many times, each noun phrase occurs in the abstract of the publication it was extracted from. The list below contains a title of each publication follow ed by the noun phrases extracted from this title. The numbers show how many times the noun phrase occurs in the abstract. 1993 Citrus Water Use Report water report 0 use report 0 2000 Florida Citrus Pest Management Guide: Alternaria Brown Spot pest guide 0 management guide 0 alternaria spot 0 2000 Florida Citrus Pest Management Guide: Approaches to Processed and Fresh Fruit Management florida citrus 0 management guide 0 fruit management 0 2000 Florida Citrus Pest Man agement Guide: Brown Rot of Fruit florida citrus 0 management guide 0 2000 Florida Citrus Pest Management Guide: Citrus Leafminer pest guide 0 management guide 0 2000 Florida Citrus Pest Management Guide: Citrus Root Sprouts pest gui de 0 management guide 0 root sprouts 2 2000 Florida Citrus Pest Management Guide: Citrus Root Weevils pest guide 0 management guide 0 root weevils 1 2000 Florida Citrus Pest Management Guide: Citrus Rust Mites pest guide 0 management guide 0

PAGE 123

123 rust mites 1 2000 Florida Citrus Pest Management Guide: Eastern Subterranean Termite pest guide 0 management guide 0 2000 Florida Citrus Pest Management Guide: Greasy Spot pest guide 0 management guide 0 200 0 Florida Citrus Pest Management Guide: Interpreting PPE Statements on Pesticide Labels florida citrus 0 management guide 0 ppe statements 1 pesticide labels 1 2000 Florida Citrus Pest Management Guide: Management of Soil Applied Agricu ltural Chemicals on Ridge Citrus management guide 0 2000 Florida Citrus Pest Management Guide: Melanose pest guide 0 management guide 0 2000 Florida Citrus Pest Management Guide: Other Insect Pests pest guide 0 management guide 0 insect pests 0 2000 Florida Citrus Pest Management Guide: Pesticide Application Technology Foliar pest guide 0 management guide 0 pesticide technology foliar 0 application technology foliar 0 2000 Florida Citrus Pest Management Guide: Pesticide Resistance and Resistance Management florida citrus 0 management guide 0 pesticide resistance 0 resistance management 0 2000 Florida Citrus Pest Management Guide: Pesticides Registered for Use on Florida Citrus pest guide 0 management guide 0 florida citrus 1 2000 Florida Citrus Pest Management Guide: Phytophthora Foot Rot and Root Rot florida citrus 0 management guide 0 phytophthora rot 0 foot rot 1

PAGE 124

124 root rot 1 2000 Florida Citrus Pest Management Guide: Plant Growth Regulators pest guide 0 management guide 0 plant regulators 0 growth regulators 0 2000 Florida Citrus Pest Management Guide: Post bloom Fruit Drop pest guide 0 management guide 0 post drop 0 bloom drop 0 fruit drop 1 2000 Florida Citrus Pest Management Guide: Postharvest Decay Stem End Rot and Green Mold florida citrus 0 management guide 0 2000 Florida Citrus Pest Management Guide: Scale Insects pest gui de 0 management guide 0 scale insects 3 2000 Florida Citrus Pest Management Guide: Schedule for Timing of Fungicides for Disease Control florida citrus 0 management guide 0 disease control 1 2000 Florida Citrus Pest Management Gu ide: Spider Mites pest guide 0 management guide 0 spider mites 1 2000 Florida Citrus Pest Management Guide: The Use of Pesticides in Citrus IPM florida citrus 1 management guide 0 2000 Florida Citrus Pest Management Guide: Weeds pest guide 0 management guide 0 2000 Florida Citrus Pest Management Guide: Worker Protection Standards pest guide 0 management guide 0 worker standards 0 protection standards 0 2008 Florida Citrus Pest Management Guide: Pesticide Application Technology pest guide 0 management guide 0

PAGE 125

125 pesticide technology 0 application technology 0 A Comparison of Controlled Release to Conventional Fertilizer on Mature Marsh Grapefruit marsh grapefruit 0 A Comparis on of Effective Rainfall Calculations Using the SCSi TR2t Method and AFSIRS A Functional Assessment of South Florida Freshwater Wetlands and Models for Estimates of Runoff and Pollution Loading florida freshwater 0 runoff loading 0 A Guide for Pl astic Tile Drainage in Florida Citrus Groves florida groves 0 citrus groves 0 A Landowner's Handbook for Controlling Erosion from Forestry Operations forestry operations 1 A Manual of Reference Management Practices for Agricultural Activitie s reference practices 0 management practices 0 A Pesticide Surface Water Mobility Index and its Relationship with Concentrations in Agricultural Drainage Watersheds pesticide surface 0 water index 0 mobility index 1 drainage wat ersheds 0 A Simplified Approach to Predict Surface Runoff and Water Loss Hydrographic surface runoff 3 water hydrographic 0 loss hydrographic 0 A Study of Crown Flood Irrigation Methods crown methods 0 flood methods 0 irrigation methods 0 ASSESSING THE NONPOINT SOURCE POLLUTANT REMOVAL EFFICIENCIES OF A TWO BASIN STORM WATER MANAGEMENT SYSTEM IN AN URBANIZING WATERSHED removal efficiencies 0 water system 0 management system 0 Accuracy Requirements fo r Climate Factors Used to Estimate Reference Evapotranspiration in a Humid Region accuracy requirements 0 climate factors 0 reference evapotranspiration 1

PAGE 126

126 Adsorption and Transport of Nitrate and Bromide in a Spodosol Aerial Pesticide Drift M anagement Aerobic and Anaerobic Degradation of Organic Contaminants in Florida Groundwater florida groundwater 0 Agricultural Aircraft Calibration and Setup for Spraying Agricultural Aircraft Spreader Setup aircraft setup 0 spreader setup 0 Agricultural BMPs for Phosphorus Reduction in South Florida Agricultural Chemical Drift and Its Control chemical drift 0 Agricultural Demonstration Project "Comparing Water Management Systems for Commercial Tomato Production during the Fall 1988 Seaso n" demonstration project 0 water systems 0 management systems 1 tomato production 0 fall season 0 Agricultural Irrigation Monitoring Program (1989 1992) irrigation program 0 monitoring program 0 Agricultural Phosphorus and Eutrophication Second Edition Agricultural Tree Crop Pest Control Pesticide Applicator Training Manual tree control 0 crop control 0 pest control 2 pesticide manual 0 applicator manual 0 training manual 0 Agricultural Wa ter Use Modeling water modeling 0 use modeling 0 Agricultural Water use Estimation using Geospatial Modeling and a Geographic Information System information system 1 Ammonia Volatilization from Different Fertilizer Sources and Effects of Tem perature and Soil pH. ammonia volatilization 1 fertilizer sources 0 temperature ph. 0 Ammonium Absorption and Desorption in Sandy Soils An Atlas of the Everglades Agricultural Area Surface Water Management Basins area basins 0

PAGE 127

127 surfac e basins 0 water basins 0 management basins 2 An Evaluation of Nutrient Removal by Citrus Fruits An Investigation of the St. Johns Water Control District: Reservoir Water Quality and Farm Practices st. district 0 johns district 0 water district 0 control district 1 reservoir quality 0 water quality 2 farm practices 0 Analysis and Design of Stormwater BMPs stormwater bmps 1 Analysis of Efficiency of Overhead Irrigation in Container Production contain er production 0 Applying Pesticides Correctly A Guide for Pesticide Applicators pesticide applicators 0 Aquatic Ecosystems: Harbingers of Endocrine Disruption Aquatic Weed Terms, Definitions and Abbreviations weed terms 1 Assessment and manag ement of long term nitrate pollution of ground water in agriculture dominated watersheds nitrate pollution 0 ground water 4 Atmospheric Deposition of Nitrogen in a High Lightning Intensity Area lightning area 0 intensity area 0 Basic Ir rigation Scheduling in Florida irrigation scheduling 0 Basic Pesticide Training A Guide for Pest Control Technicians pesticide training 0 control technicians 0 Basinwide Water Requirement Estimation in Southern Florida basinwide estimation 0 water estimation 0 requirement estimation 0 Behavior of Pesticides in Soils and Water Best Management Practices for Agrichemical Handling and Farm Equipment Maintenance management practices 1 farm maintenance 2 equipment maintenance 0

PAGE 128

128 Best Management Practices to Protect Groundwater from Agricultural Pesticides management practices 0 Best Management Practices to Reduce Water Pollution from Forestry management practices 1 water pollution 2 B est Nitrogen and Irrigation Management Practices for Citrus Production in Sandy Soils nitrogen practices 0 management practices 3 Biological Control with Grass Carp Biological Control with Insects: The Alligator Weed Flea Beetle weed beetle 0 flea beetle 1 Biological Control with Insects: The Alligator weed Stem borer weed stem borer 0 Biological Control with Insects: The Alligatorweed Thrips Biological Control with Insects: The Asian Hydrilla Moth hydrilla moth 0 Biological Co ntrol with Insects: The Hydrilla Leaf mining Flies Biological Control with Insects: The Hydrilla Stem Weevil stem weevil 0 Biological Control with Insects: The Hydrilla Tuber Weevil tuber weevil 0 Biological Control with Insects: The Melaleuca Snout Beetle Biological Control with Insects: The Waterhyacinth Moth Biological Control with Insects: The Waterhyacinth Weevils Biological Control with Insects: The Waterlettuce Moth Biological Control with Insects: The Waterlettuce Weevil Biomass Distribu tion and Nitrogen 15 Partitioning in Citrus Trees distribution partitioning 0 Biscayne Bay Water Quality Monitoring Network biscayne quality 0 bay quality 0 water quality 5 Brighton Reservation Monitoring reservation monitoring 0 B roadcast Boom Sprayer Calibration broadcast calibration 0 boom calibration 0 sprayer calibration 0 Broadcast Boom Sprayer Nozzle Uniformity Check boom check 0 sprayer check 0 nozzle check 0

PAGE 129

129 uniformity check 0 Building Pla ns and Management Practices for a Permanently Sited Agricultural Pesticide Mixing Loading Facility management practices 0 By Product Iron Humate Increases Tree Growth and Fruit Production of Orange and Grapefruit tree growth 1 fruit production 0 CHARACTERIZATION OF VELVETBEAN (MUCUNA PRURIENS) LINES FOR COVER CROP USE mucuna pruriens 1 cover use 0 crop use 0 CONCEPTUAL EVALUATION OF CONTAINERIZED TOMATO PRODUCTION tomato production 0 CONSIDERATIONS IN CA LCULATING AND REPORTING WATER APPLICATION RATES FOR DRIP IRRIGATED LANDS water rates 0 application rates 0 drip lands 0 CONVERTING SOIL PHOSPHORUS READINGS BASED ON MECHLICH 3 EXTRACTIONMETHODS INTO MECHLICH 1 IN TWO FLORIDA FLATWOODS CITRUS GROVES citrus groves 0 COUNTY FACULTY IN SERVICE TRAININGFOR WATER SAMPLING AND CHEMICAL ANALYSIS county faculty 0 trainingfor sampling 0 water sampling 1 chemical analysis 0 Central and Southern Florida Flood Control florida c ontrol 0 flood control 0 Characteristics of Potential Evapo Transpiration in Florida Citrus Irrigation Conservation Demonstration citrus demonstration 0 irrigation demonstration 0 conservation demonstration 0 Citrus Irrigation Management Classifying Landscape Ornamental Species into Water Use Groups Using Coefficients of Water Use Efficiency landscape species 0 water groups 0 use groups 0 water efficiency 0 use efficiency 2

PAGE 130

130 Comparative Pollutant Removal Capability of Urban BMPs: A Reanalysis pollutant capability 0 removal capability 2 Comparison of Mehlich 3, Mehlich 1, Ammonium Bicarbonate DTPA, 1.0m Ammonium Acetate, and 0.2m Ammonium Chloride for Extraction of Calcium, Magnesium, Phosphorus, and Potassium for Wide Range of Soil ammonium bicarbonate dtpa 0 m acetate 1 ammonium acetate 1 m chloride 1 ammonium chloride 1 calcium potassium 0 Comparison of Methods of Evapotranspiration Estimation evapotranspiration estimation 0 Comparison of Two Pesticide Leaching Indices pesticide indices 0 leaching indices 0 Conservation Area Inflows and Outflows conservation inflows 0 area inflows 0 Containerized Strawberry Transplants Reduce Establishment pe riod Water Use and Enhance Early Growth and Flowering Compared with Bare root Plants strawberry transplants 0 water use 1 Controlled Release Fertilizer Use on Young "Hamlin" Orange Trees hamlin trees 0 orange trees 0 Critical Leaf Freezing Point Temperatures for Florida Citrus. A Fact Sheet for Citrus Growers point temperatures 0 florida citrus 1 fact sheet 0 DEVELOPMENT OF A PROTECTED CULTURE PRODUCTION SYSTEM TO ELIMINATE THE USE OF OVERHEAD SPRINKLER IRRIGATION FOR FREEZE PROTECTION IN FLORIDA STRAWBERRIES. culture system 1 production system 1 sprinkler irrigation 0 freeze protection 0 florida strawberries 0 DIURNAL CHANGES IN DEWPOINT TEMPERATURE DURING THE FREEZE SEASON IN NORTH FLORIDA freeze season 0 north florida 1

PAGE 131

131 Denitrification from Sandy Soils Treated with Liquid or Dry Granular Nitrogen Form nitrogen form 0 Denitrification in the Vadose Zone and in Surficial Groundwater of a sandy Entisol with Citrus Production Desali nation A Viable Alternative in South Florida's Water Resources Planning water planning 0 resources planning 0 Design and Evaluation of an Automated Water Table Control System for use with SeepageSubirrigation System water system 0 table system 0 control system 2 seepagesubirrigation system 0 Development of phosphorus indices for nutrient management planning strategies in the United States. management strategies 0 planning strategies 0 EFFECTS OF MICRO IRRIGATION MANAGEMENT PRACTICES ON FLORIDA FLATWOODS GRAPEFRUIT PRODUCTION irrigation practices 0 management practices 1 florida flatwoods 1 EFFICIENCY OF DRIP APPLIED HERBICIDES IN TOMATO IN PUERTO RICO AND DOMINICAN REPUBLIC dominican republic 2 ESTABLISHING A BASELINE: PRE RESTORATION STUDIES OF THE CHANNELIZED KISSIMMEE RIVER kissimmee river 2 ESTIMATING WATER REQUIREMENTS FOR IRRIGATING FLORIDA VEGETABLES water requirements 4 florida vegetables 0 EVALUATION OF CONTAINERIZED TOMA TO PRODUCTION: PRODUCTION POTENTIAL AND ECONOMIC FEASIBILITY tomato production 0 production potential 0 EVALUATION OF WET DETENTION FOR TREATMENT OF SURFACE WATER RUN OFF FROM A CITRUS GROVE IN SOUTH FLORIDA surface water 1 Economic Analysi s of Ethanol Production from Citrus Peel Waste ethanol production 0 Economics of Controlled Released Fertilizer Use on Young Citrus Trees citrus trees 0 Economics of Surface Water Runoff Storage in Brackish Aquifers in South Florida

PAGE 132

132 surface st orage 1 water storage 1 runoff storage 0 Effect of Foliar Applications of Ascorbic Acid plus Ferrous Sulfate on Leaf Greenness of 'Arkin' Carambola ( Averrhoa carambola L.) Trees foliar applications 0 acid ferrous 0 plus ferrous 1 leaf greenness 2 carambola trees 2 averrhoa trees 0 carambola trees 2 l. trees 1 Effect of Nitrogen Rate on Yield of Tomato Grown with Seepage Irrigation and Reclaimed Water nitrogen rate 0 Effect of Winter and Springtime Applications of Foliar Urea, NPK, or K Phosphite Sprays on Productivity of Citrus in Central Florida springtime applications 0 foliar urea 0 Effective Rainfall in Poorly Drained Micro Irrigated Citrus Orchards citrus orchards 0 Effectiveness of Fall Potassium Sprays on Enhancing Grapefruit Size fall sprays 0 potassium sprays 1 Effects of Flooding and Irrigation Frequency on Growth and Yield of Drip Irrigated Tomatoes flooding frequency 0 drip tomatoes 0 Ef fects of Micro Irrigation Frequency on Florida Grapefruit irrigation frequency 0 Effects of Nitrogen Fertilization of Grapefruit Trees on Soil Acidification and Nutrient Availability in a Riviera fine sand nitrogen fertilization 0 soil acidifi cation 4 nutrient availability 2 riviera sand 0 fine sand 2 Effects of Nitrogen Rates on Dry Matter and Nitrogen Accumulation in Citrus Fruit and Fruit Yield nitrogen rates 0 nitrogen accumulation 1 fruit yield 3

PAGE 133

133 Effects of Soil Series on Shallow Water Table Fluctuations in Bedded Citrus soil series 3 shallow fluctuations 0 water fluctuations 0 table fluctuations 0 Effects of Volume of the Root Zone Irrigated on Water Use and Yield of Citrus root zone 3 water use 1 Effects of a Managed Three Zone Riparian Buffer System on Shallow Groundwater Quality in the Southeastern Coastal Plain. buffer system 0 shallow quality 0 groundwater quality 0 Environmental Studies in the Chandler Slough Watershed chandler watershed 0 slough watershed 1 Estimating Crop Irrigation Requirements for Irrigation System Design and Consumptive Use Permitting crop requirements 0 irrigation requirements 0 irrigation design 0 system desi gn 0 consumptive use 0 Estimating Irrigation Water Requirements estimating requirements 0 irrigation requirements 0 water requirements 0 Estimating Potential Evapotranspiration from Temperature in a Humid Region Estimating Water Requirements of Landscape Planting: The Landscape Coefficient Method water requirements 0 landscape planting 0 landscape method 0 coefficient method 0 Estimation of Effective Rainfall for Subirrigated Fields in SW Florida sw florida 0 Estimation of Nitrate Leaching in an Entisol under Optimum Citrus Production Evaluation and Enhancement of Grass Waterway Filter Strips Used for Citrus and Vegetable Production grass strips 0 waterway strips 0 filter strips 3

PAGE 134

134 vegetabl e production 0 Evaluation of Phosphorus Loading Models for South Florida loading models 1 Evaluation of a Resin Coated Nitrogen Fertilizer for Young Citrus Trees on a Deep Sand resin fertilizer 0 nitrogen fertilizer 0 Evaluation of the Pot ential of Recovered Tail Water to Serve as a Source of Pathogens for Vegetable Crops vegetable crops 0 Evaporation Effects on Sprinkler Irrigation Efficiencies evaporation effects 0 sprinkler efficiencies 0 irrigation efficiencies 0 Eva potranspiration (ET) and Net Irrigation Requirements (NIR) for Crops in South and Central Florida irrigation requirements 0 Evapotranspiration Estimation in Puerto Rico evapotranspiration estimation 0 puerto rico 3 Evapotranspiration Estimations for Wetlands and Shallow Open Water Systems in South Florida: Documentation for C Program etcalcs evapotranspiration estimations 0 program etcalcs 0 Evapotranspiration by Young Florida Flatwoods Citrus Trees florida flatwoods 1 Evapotranspiration from a Humid Region Developing Citrus Grove with Grass Cover grass cover 0 Evapotranspiration of Vegetation of Florida: Perpetuated Misconceptions versus Mechanistic Processes Everglades Nutrient Removal Project Test Cell Research: Optimizing Stormwater Treatment Area Performance The Importance of Hydrologic Conditions in Maximizing Nutrient Retention by the STAs nutrient research 0 removal research 0 project research 0 test research 0 cell research 0 s tormwater performance 0 treatment performance 0 area performance 0 FLOODING AND DRIP lRRIGATION FREQUENCY EFFECTS ON TOMATOES IN SOUTH FLORIDA. frequency effects 0

PAGE 135

135 FREQUENT FERTIGATION DOES NOT AFFECT CITRUS TREE GROWTH, FRUIT YIELD, NITRO GEN UPTAKE, AND LEACHING LOSSES tree growth 1 fruit yield 1 nitrogen uptake 0 Factors Influencing Adoption of Energy and Water Conserving Irrigation Technologies in Florida energy technologies 0 irrigation technologies 2 Factors Influencing Pesticide Movement to Ground Water pesticide movement 0 ground water 0 Fate of Pesticides in Florida's Forests: An Overview of Potential Impacts on Water Quality water quality 1 Fertilization Recommendations for Trees and Shrubs in Home and Commercial Landscapes fertilization recommendations 0 home landscapes 0 Fertilizer Effects on Early Growth and Yield of "Hamlin" Orange Trees fertilizer effects 0 orange trees 0 Fertilizer Management Key to a Sound Water Quality Program fertilizer management 0 water quality 0 Fertilizer Rates Change Root Distribution of Grapefruit Trees on a Poorly Drained Soil fertilizer rates 0 change distribution 0 root distribution 1 Field Demonstration of Micr osprinkler Irrigation and Frost Protection Systems in Citrus County field demonstration 0 microsprinkler irrigation 3 frost systems 0 protection systems 0 Field Evaluation of Fully Enclosed Subsurface Irrigation field evaluation 0 subsurface irrigation 1 Final Feasibility for Determination of Crop Water Requirements using Sap Flow Technology crop requirements 0 water requirements 1 flow technology 0 First Year Response of "Ruby Red" Grapefruit on Four Root Stocks to Fertilization and Salinity

PAGE 136

136 root stocks 0 Fish Containment Barriers containment barriers 0 Flood Irrigation Studies with Citrus irrigation studies 1 Florida Citrus Aquatic Weed Management Guide florida citrus 0 weed management 0 Florida Department of Environmental Protection Aquatic Plant Management Permits florida department 1 protection permits 0 plant permits 0 management permits 0 Foliar Nutrient Sprays Influence Yield and Size of 'Valencia' Orange foli ar sprays 1 nutrient sprays 0 influence yield 0 Freeze Protection of Crops in Southwest Central Florida Fruit Nutrient Accumulation of Four Orange Varieties during Fruit Development fruit development 1 Fully Enclosed Sub irrigation for Wat er Table Management water management 0 table management 1 General Information about Aquatic Weeds General Principles of Weed Management weed management 0 Geo Referenced Ground Photography of Citrus Orchards to Estimate Yield and Plant Stress for Variable Rate Technology ground photography 0 yield stress 0 rate technology 0 Grower Attitudes and Perceptions of Lower Quality Water Problems in Citrus Production grower attitudes 0 quality problems 0 water problems 0 G rowth, Evapotranspiration, and Nitrogen Leaching from Young Lysimeter Grown Orange Trees growth nitrogen 0 orange trees 0 Herbicide Calibration and Application herbicide calibration 0 Herbicide Transport in a Restored Riparian Forest Buffer System herbicide transport 2

PAGE 137

137 riparian system 0 forest system 0 buffer system 4 Herpetofaunal Responses to Restoration Treatments of Longleaf Pine Sandhills in Florida restoration treatments 0 longleaf sandhills 0 pine sandh ills 1 High Application Rates of Reclaimed Water Benefit Citrus Tree Growth and Fruit Production application rates 1 water benefit 0 tree growth 0 fruit production 3 How Nitrogen Supply Affects Growth and Nitrogen Uptake, Use Efficiency, and Loss from Citrus Seedlings growth uptake 0 use efficiency 1 Hydrogeologic Data Collected from the Upper East Coast Planning Area coast area 0 planning area 1 Hydrologic Aspects of On Site Retention Systems for Urban Stor m Runoff retention systems 0 storm runoff 0 IMPACT OF A RAISED WATER TABLE ON DRIP IRRIGATED TOMATOES water table 6 INCORPORATING ISOLATED WETLANDS IN SURFACE WATER MANAGEMENT SYSTEMS ASSOCIATED WITH CITRUS PRODUCTION surface systems 0 water systems 0 management systems 1 INFLUENCE OF VARIOUS PHOSPHORUS AND POTASSIUM RATES ON JUICE VITAMIN C, CAROTENE, LYCOPENE ANDSUGARCONCENTRATIONS OF FLAME GRAPEFRUIT potassium rates 0 juice vitamin 0 flame grapefruit 3 INORGAN IC NITROGEN, PHOSPHORUS, AND SEDIMENT LOSSES FROM A CITRUS GROVE DURING STORMWATER RUNOFF stormwater runoff 0 Impact of Agrichemical Facility Best Management Practices on Runoff Water Quality management practices 0 runoff quality 0 water quality 2

PAGE 138

138 Impact of Alternative Citrus Management Practices on Ground Water Nitrate in the Central Florida Ridge: II Numerical Modeling ground nitrate 0 water nitrate 4 florida ridge 1 Impact of Alternative Citrus Management Practices on Groundwater Nitrate in the Central Florida Ridge I. Field Investigation management practices 2 groundwater nitrate 3 florida investigation 0 ridge investigation 0 i. investigation 0 field investigation 0 Impact of Nitrogen Man agement Practices on Nutritional Status and Yield of Valencia Orange Trees and Groundwater Nitrate nitrogen practices 0 management practices 1 valencia trees 0 orange trees 1 groundwater nitrate 0 Impact of Reduced Water Table Dept h on Subirrigated Vegetable Production water depth 0 table depth 0 vegetable production 1 Impact of water table depth on vegetable production using seepage sub irrigation system water depth 0 table depth 0 vegetable production 0 seepage system 1 sub system 0 irrigation system 1 Improving Irrigation Management in Container Grown Landscape Ornamentals irrigation management 0 landscape ornamentals 1 Improving Seepage Irrigation Efficiency for Potato Production using Automatic Subsurface Drip Irrigation Systems irrigation efficiency 0 potato production 0 subsurface systems 0 drip systems 0 irrigation systems 1 Indian River County: Soil Ratings for Selecting Pesticides indian county 0

PAGE 139

139 river county 1 soil ratings 2 Indian River Lagoon river lagoon 2 Integration of Geographic Information Systems and a Computer Model to Evaluate Impacts of Agricultural Runoff on Water Quality information systems 0 compute r model 0 water quality 1 Introduction to Best Management Practices for Phosphorus Control on Organic Soils management practices 0 Investigation and Development of Methods to Determine Urban Landscape Irrigation for Planning and Permitting in C entral Florida landscape irrigation 0 Investigation of Potato Water Use in the Tri County Area of Putnam, St. Johns, and Flagler Counties, Florida potato water 0 st. johns 2 flagler counties 0 Investigations of the Relationship Between Land Use, Rainfall and Runoff Quality in the Taylor Creek Watershed use quality 0 taylor watershed 0 creek watershed 1 Irrigation Frequency and Depth to the Water Table Effects on Drip Irrigated Tomatoes irrigation frequency 2 table effects 0 drip tomatoes 0 irrigated tomatoes 0 Irrigation Management Practices for Florida Golf Courses management practices 0 florida courses 0 golf courses 1 Irrigation Scheduling Tables For Florida Citrus schedulin g tables 0 florida citrus 0 Irrigation Systems and Water Management Practices Employed by Strawberry Growers in Southwest Florida water practices 0 management practices 1 Irrigation Use by Mulched Staked Tomatoes in North Florida north florida 0 Irrigation Water Requirements

PAGE 140

140 water requirements 1 Irrigation Water Salinity Affects Soil Nutrient Distribution, Root Density, and Leaf Nutrient Levels of Citrus under Drip Fertigation irrigation water 3 soil distribution 1 nu trient distribution 0 root density 1 leaf levels 0 nutrient levels 0 drip fertigation 0 Irrigation of Fall Tomatoes in North Florida fall tomatoes 0 north florida 2 Irrigation of Young Citrus Trees Irrigation of Young Flatwoods Citrus Trees KNO3 Foliar Applications to 'Sunburst' Tangerine foliar applications 1 Kissimmee River Eutrophication Abatement Project river project 0 eutrophication project 0 abatement project 0 LEAF AND FRUIT MINERAL CONTENT A ND PEEL THICKNESS OF 'HAMLIN' ORANGE leaf mineral 0 LITERATURE REVIEW OF RESEARCH RELATED TO CITRUS NITROGEN NUTRITION, FERTILIZATION, AND POTENTIAL GROUNDWATER POLLUTION OF CITRUS literature review 0 nitrogen nutrition 0 groundwater poll ution 0 LYSIMETER ET MEASUREMENTS FOR DEVELOPING 'VALENCIA' ORANGE TREES orange trees 0 LYSIMETERS FOR EVALUATING TOMATO IRRIGATION REQUIREMENTS irrigation requirements 0 Labeled Aquatic Sites for Specific Herbicides Lake Okeechobee Littoral Zo ne lake zone 0 okeechobee zone 0 littoral zone 0 Lake Okeechobee Water Quality Studies and Eutrophication Assessment lake water 0 okeechobee water 0 quality studies 0 eutrophication assessment 0

PAGE 141

141 Leaching of Nitrogen Forms from Controlled Release Nitrogen Fertilizers nitrogen forms 0 nitrogen fertilizers 0 Leaching of Nitrogen from Slow Release Urea Sources in Sandy Soils urea sources 0 MACHINE VISION BASED CITRUS YIELD MAPPING SYSTEM yield system 0 m apping system 1 MANAGEMENT OF INVASIVE EXOTIC PLANTS WITH HERBICIDES IN FLORIDA MANAGEMENT OF SUBSURFACE DRIP, LEPA AND FURROW IRRIGATION SYSTEMS FOR DRAINAGE REDUCTION drip systems 0 irrigation systems 2 MODELING SOIL WATER REDISTRIBUTION AND EXTRACTION PATTERNS OF DRIP IRRIGATED TOMATOES ABOVE A SHALLOW WATER TABLE modeling water 0 soil water 2 redistribution patterns 0 shallow table 0 water table 2 Maintenance Guide for Florida Microirrigation Systems maintenance g uide 0 florida systems 0 microirrigation systems 0 Maintenance, Care and Cleaning of Application Equipment application equipment 0 Management Alternatives to Reduce Groundwater Withdrawals During Frost and Freeze Periods for Strawberries management alternatives 0 groundwater withdrawals 0 frost periods 0 Management of Nutrients in Citrus Production Systems in Florida: An Overview production systems 0 Managing Citrus Trees to Optimize Dry Mass and Nutrient Partitioning M anaging Pesticides for Citrus Production and Water Quality Protection water protection 0 quality protection 0 Manual Monitoring of Farm Water Tables farm tables 0 water tables 0 Measured and Simulated Soil Water Redistribution and Extra ction Patterns of Drip Irrigated Tomatoes Above a Shallow Water Table

PAGE 142

142 soil redistribution 0 water redistribution 0 extraction patterns 2 shallow table 0 water table 1 Measuring Soil Water Movement to Reduce Over Irrigation of Ridge Citrus measuring movement 0 soil movement 0 water movement 3 Micro sprinkler Irrigation Management What's Your Application Rate? micro management 0 sprinkler management 0 irrigation management 1 application rate 3 Micr oirrigation Management to Reduce over Irrigation and Chemical Leaching microirrigation management 0 irrigation leaching 0 Mobile Irrigation Lab Annual Report irrigation lab 3 Monitoring Nitrate nitrogen and Phosphorus in Porous: Nursery Containers and Adjacent Soil monitoring nitrate nitrogen 0 nursery containers 1 Movement of Fluridone in the Upper St. Johns River, Florida Multiple Cropping of Vegetables Using A Combined Microirrigation Seepage Irrigation System irrigation system 3 NITROGEN APPLICATION TIMING AND SOURCE FOR DRIP IRRIGATED TOMATOES nitrogen application 0 drip tomatoes 0 irrigated tomatoes 0 NITROGEN FERTILIZATION SCHEDULING OF HYDROPONICALLY GROWN "GALIA" MUSKMELON nitrogen scheduling 0 fertilization scheduling 0 Nitrogen Availability to Citrus Seedlings from Urea and from Mineralization of Citrus Leaf or Compost nitrogen availability 0 Nitrogen Best Management Practice for Citrus Trees I. Fruit yield, Quality, and Leaf Nutriti onal Status management practice 1 i. yield 0 fruit yield 4

PAGE 143

143 Nitrogen Management for High Yield and Quality of Citrus in Sandy Soils nitrogen management 0 Nitrogen Mineralization and Transformation from Composts and Biosolids during Field Incubation in a Sandy Soil nitrogen mineralization 0 field incubation 1 Nitrogen Mineralization from Citrus Tree Residues under Different Production Conditions nitrogen mineralization 0 tree residues 1 production conditions 1 Nitrogen Recovery from Controlled Release Fertilizers under Intermittent Leaching and Dry Cycles nitrogen recovery 0 Nitrogen Release Patterns of a Mixed Controlled release Fertilizer and Its Components release patterns 2 Nitrogen Transformation and Ammonia Volatilization from Biosolids and Compost Applied to Calcareous Soil nitrogen transformation 0 ammonia volatilization 1 Nitrogen Uptake Efficiency and Leaching Losses from Lysimeter Grown Citrus Trees Fertilized at Three Nitrogen Rate s efficiency losses 0 citrus trees 0 nitrogen rates 0 Nitrogen Uptake and Growth of Two Citrus Rootstock Seedling in a Sandy Soil Receiving Different Controlled Release Fertilizer Sources nitrogen uptake 0 fertilizer sources 0 Nitr ogen Uptake by Citrus Leaves nitrogen uptake 0 Nitrogen Volatilization and Mineralization in a Sandy Entisol of Florida under Citrus nitrogen volatilization 0 Nitrogen best management practice for citrus trees II. Nitrogen fate, transport, and co mponents of N budget management practice 2 n budget 1 Nitrogen versus Phosphorus Limitation of Phytoplankton Growth in Ten Mile Creek, Florida, USA phytoplankton growth 0 ten creek 0 mile creek 1

PAGE 144

144 No Runoff Watering Systems for Foliage and Flowering Potted Plant Production watering systems 0 plant production 0 Nonpoint Source Components of Total Maximum Daily Loads source components 0 Nutrient Leaching Potential of Mature Grapefruit Trees in a Sandy Soil nutrien t potential 0 grapefruit trees 0 Nutrient Loads in Stream flow from Sandy Soils in Florida stream flow 2 Nutrient Management for Greenhouse Production of Container grown Organic Herbs greenhouse production 1 Nutrient and Sediment Removal b y a restored Wetland Receiving Agricultural Runoff ON FARM DEMONSTRATION OF SOIL WATER MOVEMENTIN VEGETABLES GROWN WITH PLASTICULTURE soil vegetables 0 water vegetables 0 movementin vegetables 0 OPTIMAL IRRIGATION AND LAND ALLOCATION FOR FRE SH MARKET DRIP IRRIGATED TOMATOES DURING WATER SUPPLY LIMITATIONS land allocation 2 OPTIMIZATION OF DRAINAGE LYSIMETER DESIGNFOR FIELD DETERMINATION OF NUTRIENT LOADS designfor determination 0 field determination 0 Overview of Cooperative Wa ter Quality Studies in the Everglades Agricultural Area and Lake Okeechobee SFWMD and The Florida Sugar Cane League water studies 0 quality studies 0 lake okeechobee 1 sugar league 0 cane league 1 PALM BEACH COUNTY'S CHANGING LANDSCAPE: HISTORICAL TRENDS AND FUTURE DIRECTION OF THE AGRICULTURAL RESERVE palm county 0 beach county 2 PRECISION AGRICULTURE TECHNOLOGIES FOR ALDICARBAPPLICATION agriculture technologies 0 PRODUCTION GUIDE FOR FLORIDA CHINESE LEAFY VEGET ABLES, INTRODUCTION production guide 0 florida chinese 0

PAGE 145

145 Phosphorus Budget Land use relationships for the Northern Lake Okeechobee Watershed, Florida land relationships 0 use relationships 0 lake watershed 0 okeechobee watersh ed 2 Phosphorus Concentrations and Loads in Runoff Water under Crop Production runoff water 6 crop production 1 Phosphorus Loss and Runoff Characteristics in Three Adjacent Agricultural Watersheds with Clay pan Soils loss characteristics 0 Phosphorus and Heavy Metal Attachment and Release in Sandy Soil Aggregate Fractions soil fractions 0 aggregate fractions 3 Please Release Me, Let Me Grow! Post Bloom and Summer Foliar K Effects on Grapefruit Size post bloom 1 summer foli ar 1 k effects 0 Potential Evapotranspiration Probabilities and Distributions in Florida evapotranspiration probabilities 0 Power Analysis of On farm Fertilizer Trials with Tomato power analysis 3 fertilizer trials 2 Practical Management Alternatives to Lower Quality Water in Citrus management alternatives 0 quality water 0 Preliminary Investigations of Periphyton and Water Quality Relationships in the Everglades Water Conservation Areas periphyton relationships 0 quality relationships 0 water areas 0 conservation areas 1 Program Fertilization for Establishment of Orange Trees program fertilization 0 Proper Disposal of Pesticide Waste pesticide waste 1 Quantifying the agricultural landscape and assessing spatio temporal patterns of precipitation and groundwater use precipitation use 0 RELATING CITRUS CANOPY SIZE AND YIELD TO PRECISION FERTILIZATION precision fertilization 0

PAGE 146

146 RESULTS OF RESEARCH AND RESPONSE OF CITRUS TO SUPPLEMENTAL IRRIGATION RIDGE CITRUS NITRATE / GROUNDWATER NITRATE MONITORING PROJECT, PHASE I groundwater project 0 nitrate project 0 monitoring project 0 RIDGE CITRUS NITRITE / GROUNDWATER NITRATE MONITORING PROJECT : Phase 1 citrus project 0 nitrite project 0 groundwater project 0 nitrate project 0 monitoring project 0 ROOTSTOCK EFFECTS ON MURCOTT TANGOR TREESGROWN IN A CALCAREOUS ALFISOL OR A SPODOSOL tangor treesgrown 0 Reclaiming Florida's Water Recovery and Use of Runoff Water from Seep Sub irrigation on Florida Flatwoods Soils runoff water 0 seep irrigation 0 sub irrigation 1 florida soils 0 flatwoods soils 0 Reduced Irrigation of St. Augustine grass Turf grass in the Tampa Bay Area st. augustine 1 grass grass 0 turf grass 1 tampa area 0 bay area 0 Reduction of Deep Aquifer Withdrawals for Overhead Irrigated Strawberry Production using a Tailwater Recovery System: An Evaluation of Management Considerations and Desig n Characteristics aquifer withdrawals 0 strawberry production 0 tailwater system 0 recovery system 2 management considerations 0 design characteristics 0 Reduction of Irrigation Runoff Using Alternative Management Methods for Seep Sub irrigation of Field Grown Vegetable Crops FINAL REPORT irrigation runoff 0 management methods 0

PAGE 147

147 seep irrigation 2 sub irrigation 0 vegetable crops 0 Relation of Evaporation to Potential Evapotranspiration at Fort Lauderdale, Florida Release Potential of Phosphorus in Florida Sandy Soils in Relation to Phosphorus Fractions and Adsorption Capacity release potential 3 adsorption capacity 2 Response of Hamlin Orange to Fertilizer Source, Annual Rate and Irrig ated Area hamlin orange 0 fertilizer source 0 Response of Macro Invertebrates and Small Fish to Nutrient Enrichment in the Northern Everglades Response of Young Citrus Trees to Irrigation Ridge Citrus Water Quality Project Annual Report 1997 1998 water project 0 quality project 0 report 1997 1998 0 Ridge Citrus Water Quality Project: Annual Progress Report (1999 2000) water project 0 quality project 0 progress report 0 Root Distribution of Grapefruit Trees under Dr y Granular Broadcast vs. Fertigation Method root distribution 1 fertigation method 0 Runoff Characteristics of Row Cropped Vegetable Production on Flatwoods Soils as Affected by Bed Covering runoff characteristics 1 row production 0 cropped production 0 vegetable production 0 bed covering 0 SCHEDULING CITRUS IRRIGATION SOIL MOISTURE IN THE POTATO ROOT ZONE UNDER SEEPAGE IRRIGATION soil moisture 5 potato zone 0 root zone 2 SUBIRRIGATION BY MICROIRRIGATION SURFACE WATER QUALITY MONITORING NETWORK OPTIMIZATION COMPREHENSIVE REPORT surface quality 0 water quality 0 monitoring optimization 0

PAGE 148

148 network optimization 0 Salinity Reduces Water Use and Nitrate N Efficiency of Citrus water use 1 nitrate n efficiency 0 Setting Priorities for Research on Pollution Reduction Functions of Agricultural Buffers pollution functions 0 reduction functions 0 Shade Effects on Salinity Tolerance of 'Valencia' Orange Trees on Rootstocks with Co ntrasting Salt Tolerance shade effects 0 orange trees 1 salt tolerance 1 Simple Water Level Indicator for Seepage Irrigation water level 1 Simulated Citrus Water Use from Shallow Groundwater water use 0 shallow groundwater 0 S ize, Biomass, and Nitrogen Relationships with Sweet Orange Tree Growth size nitrogen 0 orange growth 0 tree growth 0 Sod Irrigation On Farm Demonstration Project sod irrigation 0 demonstration project 0 Soil Moisture Monitoring Techniques for Optimizing Citrus Irrigation soil techniques 0 moisture techniques 0 monitoring techniques 0 Soil Ratings for Selecting Pesticides for Water Quality Goals soil ratings 0 water goals 0 quality goals 0 Soil Temper ature, Nitrogen Concentration, and Residue Time Affect Nitrogen Uptake Efficiency in Citrus soil temperature 1 nitrogen concentration 0 residue time 0 nitrogen efficiency 0 uptake efficiency 2 Solubility of Phosphorus and Heavy Met als in Potting Media Amended with Yard Waste Biosolids Compost yard compost 0 Sorption Desorption and Solution Concentration of Phosphorus in a Fertilized Sandy Soil

PAGE 149

149 sorption desorption concentration 0 South Florida Water Management District Wate r Quality Monitoring Network 1980 Annual Report florida water 1 water water 0 management water 0 district water 0 quality network 0 monitoring network 0 Spatial and Temporal Variations of Water Quality in Drainage Ditches within Vegetable Farms and Citrus Groves water quality 2 vegetable farms 3 citrus groves 3 Springtime Nitrogen Uptake, Partitioning, and Leaching Losses from Young Bearing Citrus Trees of Differing Nitrogen Status nitrogen uptake 1 b earing trees 0 citrus trees 1 nitrogen status 0 St. Lucie County Area: Soil Ratings for Selecting Pesticides st. area 0 lucie area 0 county area 0 soil ratings 1 Stem Flow. Throughfall, and Canopy Interception of Rainfall by Citrus Tree Canopies tree canopies 1 Stormwater BMPs and Groundwater Protection: Infiltration Practices can have Unintended Consequences for Groundwater Supplies groundwater protection 0 infiltration practices 0 groundwater supplies 0 Subsurface Drip Irrigation for Water Table Control and Potato Production drip irrigation 0 water control 0 table control 0 potato production 0 Surface Runoff Losses of Copper and Zinc in Sandy Soil surface losses 0 runoff losses 1 THE BASIS FOR MATURE CITRUS NITROGEN FERTILIZATION RECOMMENDATIONS nitrogen recommendations 0 fertilization recommendations 0

PAGE 150

150 THE IMPACT OF FOUR HURRICANES IN 2004 ON THE FLORIDA CITRUS INDUSTRY: EXPERIENCES AND LESSONS LEARN ED TOMATO FERTILIZATION, GROUND COVER, AND SOIL NITRATE NITROGEN MOVEMENT tomato fertilization 0 ground cover 2 soil movement 0 nitrate movement 0 nitrogen movement 0 Temporal and spatial variations of nutrients in the Ten Mile Cre ek of South Florida, USA and effects on phytoplankton biomass ten creek 0 mile creek 1 phytoplankton biomass 0 Tensiometer Service, Testing and Calibration tensiometer service 0 Tensiometer Controlled Drip Irrigation Scheduling of Tomat o drip irrigation 1 Tensiometers for Soil Moisture Measurement and Irrigation Scheduling soil measurement 0 moisture measurement 0 irrigation scheduling 1 The 1/128th of an Acre Sprayer Calibration Method acre method 1 sprayer m ethod 0 calibration method 0 The Application of the Receiving Water Quality Model to the Conservation Areas of South Florida receiving model 0 water model 0 quality model 0 The Economic Impact of Water Quality on Citrus Production water quality 4 The Economics of Stormwater BMPs: An Update economics bmps 0 The Effect of Water Table Level, Crop Density, Climate and Related Factors on Evapotranspiration and Crop Growth water table 2 level factors 0 density clima te 0 climate factors 0 evapotranspiration growth 0 The Effectiveness of Buffer Strips for Ameliorating Offsite Transport of Sediment, Nutrients, and Pesticides from Silvercultural Operations

PAGE 151

151 The Relative Salt Tolerance of 'Rangpur' Seedlings an d 'Arbequina' Olive Cuttings rangpur seedlings 0 olive cuttings 0 The Water Quality Planning Model The effect of ammonium nitrate fertiliser on frog (Rana temporaria) survival ammonium fertiliser 0 nitrate fertiliser 1 rana temporari a 1 The effects of spacing and thinning on stand and tree characteristics of 38 year old Loblolly Pine stand characteristics 0 The use of bed covers to protect fruiting strawberry from freezes.. Thresholds of Leaf Nitrogen for Optimum Fruit Producti on and Quality in Grapefruit leaf nitrogen 0 Tolerance of Low Quality (Saline) Irrigation Water by Florida grown Tomato Varieties irrigation water 0 tomato varieties 1 Toxicological Studies in Tropical Ecosystems: an Ecotoxicological Risk Assessment of Pesticide Runoff in South Florida Estuarine Ecosystems risk assessment 0 pesticide runoff 0 florida ecosystems 0 estuarine ecosystems 0 Transformation and Transport of Nitrogen Forms in a Sandy Entisol following a Heavy Lo ading of Ammonium Nitrate Solution: Field Measurements and Model Simulations nitrogen forms 0 ammonium solution 0 nitrate solution 0 field measurements 0 model simulations 0 Transformation of Urea and Ammonium Nitrate in an Entiso l and a Spodosol Under Citrus Production Tree Growth, Mineral Nutrition and Nutrient Leaching Losses from Soil of Salinized Citrus tree growth 0 mineral nutrition 0 leaching losses 1 Trickle Irrigation Scheduling 1: Durations of Water Applic ations trickle scheduling 0 irrigation scheduling 0

PAGE 152

152 water applications 2 Trickle Irrigation Scheduling for Florida Citrus irrigation scheduling 2 florida citrus 0 UF/IFAS Standardized Fertilization Recommendations for Agronomic Cro ps fertilization recommendations 3 Upland Detention / Retention Demonstration Program retention program 0 demonstration program 0 Urban Lawn Infiltration Rates and Fertilizer Runoff Losses under Simulated Rainfall lawn infiltration 0 fertilizer losses 1 runoff losses 0 Use of Amendments to Reduce Leaching Loss of Phosphorus and Other Nutrients from a Sandy Soil in Florida Use of Controlled Release Fertilizer for Young Citrus Trees citrus trees 1 Use of Water and Micro sp rinkler Irrigation for Citrus Freeze Protection Project A 2 water irrigation 0 sprinkler irrigation 0 freeze project 0 protection project 0 Use of Water and Micro sprinkler Irrigation for Citrus Cold Protection water irrigation 0 U se of dolomite phosphate rock (DPR) fertilizers to reduce phosphorus leaching from sandy soil Use of the Urease Inhibitor N (n Butyl) Thiophospheroric Triamide Decreased Nitrogen Leaching from Urea in a Fine Sandy Soil inhibitor n 0 nitrogen leachi ng 0 Vadose Zone Processes and Chemical Transport zone processes 0 chemical transport 0 Variation in Soil pH and Calcium Status Influenced by Microsprinkler Wetting Pattern for Young Citrus Trees soil ph 3 calcium status 0 microsprinkler pattern 0 citrus trees 0 Vegetable Production Using Fully Enclosed Sub irrigation: Declining vs. Static Water Table Positions vegetable production 0

PAGE 153

153 sub irrigation 1 water positions 0 table positions 0 Vegetated Filter Strips for Non Point Source Pollution Control Nutrient Considerations filter strips 2 strips control 0 pollution control 0 Vegetative buffer zones as pesticide filters for simulated surface runoff buffer zones 1 pesticide f ilters 0 surface runoff 2 WATER AND FERTILIZER MANAGEMENT OF MICROIRRIGATED FRESH MARKET TOMATOES market tomatoes 0 WATER MANAGEMENT OF WETLAND CITRUS IN FLORIDA water management 0 WATER NEEDS OF FLORIDA VEGETABLE CROPS water needs 3 florida crops 0 vegetable crops 0 WATER REQUIREMENTS AND CROP COEFFICIENTS FOR STRAWBERRY PRODUCTION IN S.W. FLORIDA water requirements 2 crop coefficients 1 s.w. florida 0 WATER REQUIREMENTS FOR DRIP IRRIGATED STRAWBERRIES IN SOUT H CENTRAL FLORIDA water requirements 1 WATER REQUIREMENTS OF MICROIRRIGATED STRAWBERRIES water requirements 0 WEED CONTROL USING GRANULAR HERBICIDESON CONTAINER GROWN ORNAMENTALS herbicideson ornamentals 0 container grown ornamentals 0 Water Harvesting and Recycling Ebb and Flow System in a Container Nursery ebb system 0 container nursery 0 Water Management for Citrus Production in the Florida Flatwoods water management 0 florida flatwoods 0 Water Quality Analysis in the Water Conservation Areas water analysis 0 quality analysis 0 conservation areas 0

PAGE 154

154 Water Quality Aspects of the Caloosahatchee River Systems, Phase 1 water aspects 0 quality aspects 0 Water Quality Impact of Vegetative Filter St rips quality impact 0 filter strips 1 Water Quality Impacts of Vegetative Filter Strips and Riparian Areas quality impacts 0 filter strips 1 riparian areas 2 Water Quality Management Plan for the S 2 and S 3 Drainage Basins in the Everglades Agricultural Area water plan 0 quality plan 0 management plan 0 drainage basins 0 Water Quality and Nutrient Loading of the Major Inflows from the Everglades Agricultural Area to the Conservation Areas, Southeast Florida water quality 2 nutrient loading 0 conservation areas 3 southeast florida 0 Water Quality in the Everglades Agricultural Areas and its Impact on Lake Okeechobee water quality 0 lake okeechobee 1 Water Quality of the Caloosahat chee River System Phase II water quality 4 river system 1 Water Recycling in Seepage Irrigation of Potatoes water recycling 0 Water Relations of 7 Year Old, Containerized Citrus Trees Under Drought and Flooding Stress water relations 4 drought stress 1 Water Requirement and Crop Coeffiencies for Strawberry Production in S.W. Florida requirement coeffiencies 0 s.w. florida 0 Water Requirements and Crop Coefficients for Tomato Production in Southwest Florida water requi rements 4 crop coefficients 3 tomato production 1

PAGE 155

155 Water Requirements for Citrus water requirements 0 Water Requirements for Flatwoods Citrus water requirements 1 Water Requirements for Florida Agricultural Crops water requirements 0 Water Requirements for the Production of Landscape Ornamentals in Small Containers water requirements 4 landscape ornamentals 2 Water Table Fluctuation and Depth of Rooting of Citrus Trees in the Indian River Area wat er table 2 indian area 0 river area 0 Water Table Management Using Microirrigation Tubing water management 0 table management 0 microirrigation tubing 1 Water Table Monitoring table monitoring 1 Water Use and Irrigation Sche duling of Young Blueberries use scheduling 0 Water Use by Florida Vegetable Crops florida crops 0 vegetable crops 0 Water Use in Strawberry Transplant Establishment water use 0 transplant establishment 0 Water and Fertilizer Manage ment of Micro Irrigated Tomato Production on Sandy Soils in Southwest Florida: Final Report water management 0 tomato production 2 Weed Control in Florida Ponds weed control 0 florida ponds 0 Weed Management in Fence Rows and Non Cropped Areas weed management 1 fence rows 1

PAGE 156

156 APPENDIX C FREQUENCY ANAL YSIS RESULTS BMP PUBLICATION L IBRARY This appendix contains a list of 200 most frequent concepts extracted from the corpus of BMP Publication Library and 200 most frequent concepts extracted from a corpus generated with the use of a web search engine. All n grams that matched NAL Thesaurus were extracted document corpus web corpus No. concept frequency concept frequency 1 water 27046 water 15274 2 soil 13137 soil 7687 3 irrigation 9895 plants 5316 4 florida 8973 irrigation 4863 5 trees 7002 nitrogen 4471 6 plants 5979 management 3842 7 crops 5163 citrus 3828 8 area 5073 crops 3738 9 citrus 4974 information 3487 10 management 4839 fertilizers 3401 11 fertilizers 4539 nutrients 2826 12 nitrogen 4319 fruits 2533 13 surfaces 4095 trees 2448 14 time 4054 nitrates 2313 15 samples 4025 phosphorus 2129 16 pesticides 3971 chemicals 2066 17 fruits 3890 florida 2018 18 lakes 3726 roots 1850 19 fields 3717 pesticides 1814 20 rainfall 3497 methods 1784

PAGE 157

157 21 methods 3409 models 1767 22 seasons 3387 surfaces 1735 23 nutrients 3373 oranges 1711 24 rivers 3357 research 1670 25 yields 3348 leaching 1644 26 roots 3264 area 1625 27 runoff 3191 weeds 1517 28 depth 3134 pollution 1493 29 phosphorus 3128 homes 1491 30 water quality 2938 runoff 1482 31 concentration 2815 rivers 1391 32 models 2774 yields 1305 33 flow 2600 water quality 1298 34 canals 2595 fields 1278 35 wells 2569 solids 1278 36 annuals 2527 lakes 1272 37 dates 2507 vegetables 1243 38 figs 2408 forests 1163 39 sampling 2319 filters 1143 40 basins 2316 wells 1137 41 water table 2249 land 1125 42 leaching 2246 drainage 1101 43 research 2243 buffers 1057 44 ponds 2225 sprinklers 1026 45 materials 2101 foods 1022 46 ranges 2054 herbicides 1005 47 drainage 2044 air 983 48 nitrates 1974 groundwater 976 49 volume 1904 agriculture 966

PAGE 158

158 50 monitoring 1863 time 916 51 ph 1845 conservation 913 52 herbicides 1842 wastes 897 53 sand 1840 page 889 54 groundwater 1788 flow 860 55 information 1755 lysimeters 842 56 land 1642 solutions 842 57 weeds 1586 ph 831 58 springs 1585 gardens 830 59 feet 1507 greens 820 60 costs 1504 pests 797 61 uptake 1480 uptake 797 62 groves 1438 watersheds 782 63 watersheds 1433 temperature 776 64 wetlands 1424 landscapes 771 65 chemicals 1416 equipment 770 66 oranges 1375 materials 762 67 distribution 1332 biomass 755 68 storage 1296 grapefruits 750 69 acre 1281 basins 741 70 design 1276 farms 740 71 conservation 1272 seeds 734 72 vegetation 1272 ammonia 733 73 growers 1233 wetlands 728 74 temperature 1233 carbon 704 75 water management 1223 minerals 699 76 fertilization 1189 sensors 699 77 nozzles 1186 definitions 690 78 leaves 1165 potassium 686

PAGE 159

159 79 loads 1138 insects 679 80 water use 1130 liquids 678 81 shows 1129 drinking 671 82 pests 1121 iron 668 83 pumps 1115 nutrition 667 84 lysimeters 1087 acids 666 85 urea 1080 monitoring 661 86 soil water 1031 weather 658 87 solutions 1017 animals 652 88 equations 1012 lines 650 89 density 1002 chlorides 647 90 foods 992 ranges 647 91 frequency 984 environment 645 92 weight 983 cold 644 93 containers 965 energy 636 94 marsh 957 concentration 635 95 canopy 955 sediments 626 96 irrigation systems 946 ponds 618 97 evaporation 939 humans 617 98 equipment 931 sand 616 99 farms 921 design 614 100 lines 919 floods 612 101 streams 904 vegetation 609 102 rootstocks 887 economics 605 103 tensiometers 879 turf 604 104 evapotranspiration 877 industry 594 105 rain 872 climate 592 106 vegetables 872 evapotranspiration 588 107 irrigation water 868 irrigation systems 586

PAGE 160

160 108 air 856 corn 585 109 bermudagrass 856 costs 581 110 seepage 851 water use 576 111 page 850 drip irrigation 566 112 surveys 845 manure 563 113 tanks 845 shows 555 114 amines 843 work 553 115 habitats 839 erosion 551 116 meters 837 sprayers 550 117 color 833 drinking water 547 118 reduction 828 rainfall 545 119 diameter 796 hybrids 544 120 fish 794 rootstocks 543 121 pressure 787 aquifers 542 122 nets 772 wind 542 123 sediments 767 soil water 536 124 supply 766 fish 529 125 length 765 tools 528 126 salts 762 annuals 527 127 fertigation 734 oils 523 128 oxygen 726 people 520 129 soil moisture 725 calibration 519 130 salinity 718 webs 519 131 cells 707 lemons 518 132 scales 707 storms 515 133 sandy soils 697 technology 509 134 floodplains 695 distribution 507 135 reservoirs 695 fertilization 504 136 turf 695 ground water 496

PAGE 161

161 137 acids 693 pollutants 496 138 agriculture 693 sugars 495 139 application rates 688 water table 491 140 objectives 687 nozzles 486 141 summer 670 supply 485 142 storms 667 reduction 482 143 publications 666 volume 481 144 leachates 663 soil moisture 475 145 pollution 663 balance 462 146 diuron 657 salts 458 147 stormwater 657 streams 457 148 solids 652 scales 453 149 greens 647 parks 452 150 winter 647 growers 449 151 sprayers 619 leaves 447 152 height 611 samples 446 153 pollutants 608 prices 445 154 weather 606 sodium 445 155 buffers 605 epa 444 156 aquifers 602 commercials 438 157 biomass 602 metals 428 158 mass 601 elements 425 159 grapefruits 598 copper 424 160 emitters 596 pressure 422 161 optimization 595 california 421 162 energy 594 extraction 419 163 liquids 593 salinity 419 164 usa 593 seasons 419 165 glyphosate 589 farmers 414

PAGE 162

162 166 cooperatives 588 wastewater 413 167 planting 588 fungicides 402 168 chlorides 580 americans 401 169 techniques 580 sandy soils 400 170 commercials 577 performance 397 171 width 571 containers 396 172 plastics 566 professionals 394 173 epa 559 particles 393 174 education 558 varieties 391 175 pastures 555 phytophthora 386 176 potassium 555 limes 385 177 journals 550 degradation 384 178 dicamba 548 ethanol 382 179 layers 547 fires 379 180 filters 541 buildings 378 181 slope 538 sweets 378 182 imazapyr 534 phosphates 377 183 ground water 533 markets 372 184 metals 525 wildlife 371 185 dissolved oxygen 524 nonpoint source pollution 369 186 transport 521 schools 368 187 growing season 514 habitats 366 188 copper 512 techniques 365 189 hydraulics 510 mites 363 190 floods 509 springs 359 191 forests 508 publications 358 192 wind 508 hydrilla 356 193 clay 498 urea 356 194 ingredients 496 potatoes 355

PAGE 163

163 195 spacing 495 bay 354 196 land use 487 greenhouses 354 197 work 487 water pollution 354 198 sprinklers 486 united states 353 199 cultivars 477 evaporation 351 200 demand 476 canopy 347

PAGE 164

164 LIST OF REFERENCES Alexopoulou, D., Wachter, T., Pickersgill, L., Eyre, C., & Schroeder, M. (2008). Terminologies for text mining; an experiment in the lipoprotein metabolism domain. BMC Bioinformatics, 9(Suppl 4) (S2) AWWA. (2011). WaterWiser glossary of common water relate d abbreviations and definitions. Retrieved 03/12, 2011, from http://www.awwa.org/waterwiser/references/glossary.cfm Bartol, T. (2009). Assessment of food and nutrition rela ted descriptors in agricultural and biomedical thesauri. In F. Sartori, M. Sicilia & N. Manouselis (Eds.), Metadata and semantic research (pp. 294 305) Springer Berlin Heidelberg. Beall, J. (2008). The weaknesses of full text searching. The Journal of Academic Librarianship, 34 (5), 438 444. Beck, H. W. (2008). Evolution of database designs for knowledge management in agriculture and natural resources. Journal of Information Technology in Agriculture, 3 (1) Beck, H. W., & Pinto, H. S. (2002). Overview of approach, methodologies, standards, and tools for ontologies Rome: The Agricultural Ontology Service. UN FAO. Beck, H. W. Morgan, K., Jung, Y., Grunwald, S., Kwon, H., & Wu, J. (2010). Ontology based simulation in agricultural systems modeling. Agric ultural Systems, 103 (7), 463 477. doi:DOI: 10.1016/j.agsy.2010.04.004 Bollegala, D., Matsuo, Y., & Ishizuka, M. (2007). Measuring semantic similarity between words using web search engines. WWW '07: Proceedings of the 16th International Conference on Worl d Wide Web, Banff, Alberta, Canada. 757 766. Boot, M. (1980). Homography and lemmatization in dutch texts. ALLC Bulletin, 8 175. Bray, T., Paoli, J., Sperberg McQueen, C. M., Maler, E. & Yergeau, F. (2008). Extensible markup language (XML) 1.0, fifth ed ition. Retrieved 03/12, 2011, from http://www.w3.org/TR/xml/ Brill, E., Lin, J., Banko, M., & Dumais, S. A. (2001). Data intensive question answering. Maryland USA. 393 400. Bunt, H. C., & Nijholt, A. (2000). Advances in probabilistic and other parsing technologies Norwell, MA, USA: Kluwer Academic Publishers. CABI. (2010). Centre for agricultural bioscience international CAB thesaurus 2010. Retrieved 02/07, 2011, from http://www.cabi.org/cabthesaurus/ Cambridge advanced learner's dictionary (2003). Cambridge University Press.

PAGE 165

165 Caracciolo, C., D'Aquin, M., Sabou, M., Peters, W., & Voelker, J. (2007). Results from experiments in ontology learning including evaluation and recommendation Rome, Italy: CFWC. (2010). Conserve florida water clearinghouse digital library. Retrieved 03/12, 2011, from http://library.conserveflor idawater.org/ Chandrasekaran, B., Josephson, J. R., & Benjamins, V. R. (1999). What are ontologies, and why do we need them? IEEE Intelligent Systems, 14 (1), 20 26. Cilibrasi, R., & Vitanyi, P. M. B. (2004). Automatic meaning discovery using google Cimi ano, P., & Staab, S. (2004). Learning by googling. SIGKDD Explor.Newsl., 6 (2), 24 33. doi: http://doi.acm.org/10.1145/1046456.1046460 Clark, J. (1999). XSL transformations (XSLT) versio n 1.0. Retrieved 03/12, 2011, from http://www.w3.org/TR/xslt Corazzon, R. (2011). Theory and history of ontology from a philosophical perspective. Retrieved 02/07, 2011, from http://www.ontology.co/ Croft, B., Metzler, D., & Strohman, T. (2009). Search engines: Information retrieval in practice USA: Addison Wesley Publishing Company. Cycorp. (2011). OpenCyc. Retrieved 03/12, 2011, from http://www.opencyc.org/ DCMI. (2011). The dublin core metadata intitiative. Retrieved 03/12, 2011, from http://dublincore.org/ Dean, M., et al. (2004). OWL web onto logy language reference. Retrieved 02/07, 2011, from http://www.w3.org/TR/owl ref/ Ding, Y., & Fensel, D. (2001). Ontology library systems: The essential ontology management issue for the semantic web, submitted to the knowledge engineering review FAO. (2011a). AGROVOC thesaurus. Retrieved 03/12, 2011, from http://aims.fao.org/website/AGROVOC/ FAO. (2011b). AGROVOC thesaurus statisti cs. Retrieved 02/07, 2011, from http://aims.fao.org/en/website/Statistics/sub Firth, J. R. (1957). Papers in linguistics, 1934 1951 London, New York: Oxford University Press. Gaiz auskas, R., & Wilks, Y. (1998). Information extraction: Beyond document retrieval. Journal of Documentation, 54 (1), 70 105.

PAGE 166

166 Google Inc. (2011). Google site search. Retrieved 02/07, 2011, from h ttp://www.google.com/sitesearch/ Grishman, R. (1997). Information extraction: Techniques and challenges. Paper presented at the International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology, 10 2 7. Gruber, T. R. (1993). A translation approach to portable ontology specifications. Knowl.Acquis., 5 (2), 199 220. Gruber, T. R. (2009). Encyclopedia of database systems. In L. Liu, & M. T. zsu (Eds.), Encyclopedia of database systems (1st ed., ) Springer. Guo, Q. (2008). The similarity computing of documents based on VSM. COMPSAC '08: Proceedings of the 2008 32nd Annual IEEE International Computer Software and Applications Conference, 585 586. Harris, Z. (1954). Distributional struc ture. Word, 10 (23), 146 162. Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. Paper presented at the Proceedings of the 14th Conference on Computational Linguistics Volume 2, Nantes, France. 539 545. Hodge, G. (2000). Sy stems of knowledge organization for digital libraries: Beyond traditional authority files Digital Library Federation; Council on Library and Information Resources. Holmes, K. (2010). VIVO | enabling national networking of scientists. Institute of Clinical and Translational Sciences News, 3 (3) Retrieved from http://icts.wustl.edu/about/Dec2010ICTSNews.pdf Hulden, M. (2007). CWR ontology project report A practical approach on crea ting a restricted ontology for crop wild relatives Hull, D. A. (1996). Stemming algorithms: A case study for detailed evaluation. Journal of the American Society for Information Science, (47), 70 84. IFAS BMP. (2010). Best management practices publicatio n library. Retrieved 03/12, 2011, from http://lyra.ifas.ufl.edu/LIB Jarmasz, M., & Szpakowicz, S. (2003). Roget's thesaurus and semantic similarity. Conference on Recent Advances in Natural Language Processing, 212 219. Kaplan, K. (2010, 10/05). USDA agencies to join scientific networking system. USDA News, Retrieved from http://www.usda.gov/wp s/portal/usda/ usdahome?contentidonly=true&contentid=2010/10/0507.xml

PAGE 167

167 Keet, C. M. (2009). Ontology design parameters for aligning agri informatics with the semantic web. In F. Sartori, M. Sicilia & N. Manouselis (Eds.), Metadata and semantic research ( pp. 239 244) Springer Berlin Heidelberg. Kim, S., & Beck, H. W. (2006). A practical comparison between thesaurus and ontology techniques as a basis for search improvement. Journal of Agricultural & Food Information, 7 (4), 23. Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Sapporo, Japan. 423 430. Krovetz, R. (1993). Viewing morphology as an inference process. SIGIR '93: Proceedings of the 16 th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, Pennsylvania, United States. 191 202. Kucera, H., & Francis, W. N. (1967). Computational analysis of present day american english Providence: B rown University Press. Lenci, A. (2008). Acquiring lexical and ontological information from texts. Paper presented at the SIABO, Copenhagen. Lenci, A., Montemagni, S., Pirrelli, V., & Venturi, G. (2007). NLP based ontology learning from legal texts. A ca se study. Paper presented at the LOAIT, 321 113 129. Library of Congress. (2011). The library of congress authorities and vocabularies. Retrieved 02/07, 2011, from http://id.loc.gov/ Liddy, E. D. (2003). N atural language processing. Encyclopedia of library and information science (2nd ed., ). NY: Marcel Decker, Inc. Liddy, E. D., McVearry, K. A., Paik, W., Yu, E., & McKenna, M. (1993). Development, implementation and testing of a discourse model for newspa per texts. HLT '93: Proceedings of the Workshop on Human Language Technology, Princeton, New Jersey. 159 164. Lin, E. (2010, 11/08). USDA adopts scientist networking site. The Cornell Daily Sun, Retrieved from http://cornellsun.com/section/news/content/2010/11/08/usda adopts scientist networking site Manning, C. D., Raghavan, P., & Schtze, H. (2008). Introduction to information retrieva l Cambridge ; New York: Cambridge University Press. Manning, C. D., & Schtze, H. (1999). Foundations of statistical natural language processing Cambridge, Mass.: MIT Press.

PAGE 168

168 Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of english: The penn treebank. Comput.Linguist., 19 (2), 313 330. Marneffe, M., & Manning, C. D. (2008). Stanford typed dependencies manual Stanford University. Moskovitch, R., Martins, S. B., Behiri, E., Weiss, A., & Shahar, Y. (2007). A comparative evaluation of full text, concept based, and context sensitive search. Journal of the American Medical Informatics Association, 14 (2), 164 174. NAL. (2011a). Thesaurus structure of the agricultural thesaurus. Retrieved 02/07, 2011, from http://agclass.canr.msu.edu/agt/structure.shtml NAL. (2011b). National agricultural library thesaurus and glossary. Retrieved 02/07, 2011, from http://agclass.canr.msu.edu/agt.shtml NAL. (2011c). What's new in 2011 edition of agricultural thesaurus. Retrieved 02/07, 2011, from http://agclass.canr.msu.edu/dne/whatsnew.shtml NDWR. (2011). Water words dictionary. Retrieved 03/12, 2011, from http://water.nv.gov/WaterPlanning/dict 1/ww index.cfm OAI. (2008). The open archives initiative protocol for metadata harvesting. Retrieved 03/12, 2011, from http://www.openarchives.org/OAI/openarchivesprotocol.html Oracle. (201 1). Oracle technology network java. Retrieved 03/12, 2011, from http://www.oracle.com/technetwork/java/index.html Pease, A. (2011). Suggested upper merged ontology (SUMO). Ret rieved 03/12, 2011, from http://www.ontologyportal.org/ Pinto, H. S., & Martins, J. P. (2004). Ontologies: How can they be built? Knowledge and Information Systems, 6 (4), 441 464. Porter, M. F. ( 1980). An algorithm for suffix stripping. Program, 14 (3), 130 137. Porter, M. F. (2006). The porter stemming algorithm. Retrieved 02/07, 2011, from http://tartarus.org/~martin/PorterSt emmer/ Prime Recognition. (2011). PrimeOCR Retrieved from http://www.primerec.com/ Princeton University. (2010). About WordNet. Retrieved 02/07, 2011, from http://wordnet.princeton.edu Radev, D. R. (2001). Natural language processing FAQ. Retrieved 02/07, 2011, from http://www.aclweb.org/nlpfaq.txt Random house webster's electronic dic tionary (1992). Reference Software International.

PAGE 169

169 Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. Proceedings of the 14th International Joint Conference on Artificial Intelligence Volume 1, Montreal, Quebec, C anada. 448 453. RSS Advisory Board. (2009). RSS 2.0 specification. Retrieved 03/12, 2011, from http://www.rssboard.org/rss specification Sahlgren, M. (2006). The word space model: Usin g distributional analysis to represent syntagmatic and paradigmatic relations between words in high dimensional vector spaces. Stockholm University, Department of Linguistics). Salton, G. (1989). Automatic text processing: The transformation, analysis, an d retrieval of information by computer. Boston, MA, USA: Addison Wesley Longman Publishing Co., Inc. Sanchez, D., & Moreno, A. (2007). Bringing taxonomic structure to large digital libraries. Int.J.Metadata Semant.Ontologies, 2 (2), 112 122. Soergel, D., Lauser, B., Liang, A. C., Fisseha, F., Keizer, J., & Katz, S. (2004). Reengineering thesauri for new applications: The AGROVOC example. J.Digit.Inf., 4 (4) Stanford NLP Group. (2010). The stanford parser: A statistical parser. Retrieved 02/07, 2011, from http://nlp.stanford.edu/software/lex parser.shtml Thunkijjanukij, A., Kawtrakul, A., Panichsakpatana, S., & Veesommai, U. (2009). Developing rules and criteria for rice ontology construction. Int.J.Metadata Semant.Ontologies, 4 (1/2), 54 64. Tudhope, D., & Nielsen, M. L. (2006). Introduction to knowledge organization systems and services. The New Review of Hypermedia and Multimedia, 12 (1), 3 9. Turney, P. D. (2001). Mining the we b for synonyms: PMI IR versus LSA on TOEFL. Paper presented at the Proceedings of the 12th European Conference on Machine Learning, 491 502. Retrieved from http://portal.acm.or g/citation.cfm?id=645328.650004 University of Pennsylvania. (1999). The penn treebank project. Retrieved 02/07, 2011, from http://www.cis.upenn.edu/~treebank/ Uschold, M., & King, M. (1995). Towards a methodology for building ontologies. Workshop on Basic Ontological Issues in Knowledge Sharing, Montreal. USGS. (2011). Water science glossary of terms. Retrieved 03/12, 2011, from http://ga.water.usgs.gov/edu/dictionary.html

PAGE 170

170 Vila, K., & Ferrndez, A. (2009). Developing an ontology for improving question answering in the agricultural domain. In F. Sartori, M. Sicilia & N. Manouselis (Eds.), Metadata and semantic researc h (pp. 245 256) Springer Berlin Heidelberg. VIVO. (2011a). About VIVO. Retrieved 02/07, 2011, from http://vivoweb.org/about VIVO. (2011b). VIVO specifications and technical information. Retrieved 02/07 2011, from http://vivoweb.org/about/faq/vivo specifications and technical information Yahoo! Inc. (2011). AlltheWeb. Retrieved 02/07, 2011, from http://www.alltheweb.com/ Yang, C., Yang, K., & Yuan, H. (2007). Improving the search process through ontology based adaptive semantic search. The Electronic Library, 25 (2), 234 248. Ziemba, L., Cornejo, C. & Beck H. W. (2011) A water conservation d igital library using ontologies The Electronic Library 29 ( 2 ) 200 211 Zipf, G. K. (1949). Human behaviour and the principle of least effort Cambridge, MA: Addison Wesley.

PAGE 171

171 BIOGRAPHICAL SKETCH Lukasz Ziemba obtained his M.Sc. degree in Informatics and Econometrics and M.Sc. degree in Management and Marketing at University of Lodz, Poland. Since 2006 he worked as a Research Assistant at the Department of Agricultural and Biological Engineering at the University of Florida specializing in application of information technology in the field of natural resources In the spring of 201 1 he received his Ph.D. degree in Agricultural and Biological Engineering from the University of Florida