<%BANNER%>

Applying Database and Ontology Design Techniques to a NASA Biological Research Repository


PAGE 1

APPLYING DATABASE AND ONTOLOGY DESIGN TECHNIQUES TO A NASA BIOLOGICAL RESEARCH REPOSITORY By CHRISTOPHER DAVIDSON A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ENGINEERING UNIVERSITY OF FLORIDA 2003

PAGE 2

Copyright 2003 by Christopher Davidson

PAGE 3

ACKNOWLEDGMENTS This project was jointly funded by the University of Florida Agricultural and Biological Engineering Department, the Advanced Life Support program at the Kennedy Space Center, and the NASA Graduate Student Research Program. Without their collective support and encouragement, ALLSTAR would not exist. I wish to thank my advisor, Dr. Howard Beck, for his assistance throughout this research project; also, Dr. Ray Bucklin and Dr. Doug Dankel for their continuing input. Peter Chetirkin from Dynamac at the Kennedy Space Center was instrumental in my comprehension of the ALS data and its potential for scientific and academic use. The rest of the ALS staff at KSC were always quick with answers to even the most trivial of questions. My wife, my family, and my friends unfailingly surround me with support and love. Without them, endeavors like these would be impossible and unimportant. iii

PAGE 4

FOREWORD What information consumes is rather obvious: it consumes the attention of its recipients. Hence, a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it (Simon 1971). Intelligence just means information now . people who know a lot about technology would like to console us with their faith that its neutral, that tools wont change human nature. But how do they know (Ford 1998)? Nobel laureate Herbert Simon and novelist Richard Ford express similar concerns, despite the decades separating their comments. Both ponder the deleterious effects of prodigious sources of information influencing humankinds self-perception and pace of life. Scientific research by definition necessitates a tight focus of study and a hesitance to consider immeasurable, unobservable, or statistically unwieldy properties of an experiment. But when the experiment is the collection, organization and dissemination of a large knowledge base, the effects on its field of interest understandably cannot be gauged for a long time to come. Foresight and introspection urge scientific researchers to carefully consider more than just their experimental conclusions. Without adequate means to review and analyze experimental data, in time the information becomes misplaced or forgotten. Luddites and technophobes are right to question the effects on society of unbounded, limitless stores of information. The unthinkably large database of Simons dreams and Fords nightmares will soon exist. What remains to be seen is how humankind will allow these tools to shape their existence and beliefs. iv

PAGE 5

TABLE OF CONTENTS page ACKNOWLEDGMENTS.................................................................................................iii FOREWORD.....................................................................................................................iv LIST OF TABLES............................................................................................................vii LIST OF FIGURES.........................................................................................................viii LIST OF ACRONYMS.......................................................................................................x ABSTRACT......................................................................................................................xii CHAPTER 1 INTRODUCTION........................................................................................................1 Project Background......................................................................................................2 Data Acquisition and Storage.......................................................................................4 Limitations of Data Retrieval and Analysis..................................................................4 Object-Oriented Database Design................................................................................6 Domain Ontology Development...................................................................................8 Project Goals of ALLSTAR.......................................................................................10 2 OBJECT DATABASE DESIGN FOR ALLSTAR....................................................12 Legacy Relational Schema of ALS.............................................................................12 Existing Methods of Database Access........................................................................15 Object-Oriented Database Design..............................................................................16 Object Model Development........................................................................................17 Object Modeling with UML.......................................................................................18 Class Relationships in UML.......................................................................................19 Superclass and Subclass...............................................................................20 Array dimensionality....................................................................................21 Containment.................................................................................................21 Java Class Generation.................................................................................................21 Implementing Object Persistence...............................................................................22 v

PAGE 6

3 ONTOLOGY DEVELOPMENT FOR ALS..............................................................24 What is an Ontology?.................................................................................................24 Ontology Language Selection.....................................................................................25 Unified Modeling Language (UML)...................................................................26 Resource Description Framework (RDF)............................................................27 The DARPA Agent Markup Language (DAML)................................................28 Ontology Inference Layer (OIL).........................................................................29 Advanced Ontology Operations.................................................................................29 Implementation of ALS Ontology..............................................................................31 4 ALLSTAR APPLICATION SOFTWARE DEVELOPMENT..................................35 Application Design Considerations............................................................................35 Query Processing.................................................................................................36 Database Navigation............................................................................................41 On-Screen Visualization of Data.........................................................................43 Graphical User Interface Considerations.............................................................45 Application Deployment Issues..................................................................................45 Security Issues.....................................................................................................45 Compatibility Issues............................................................................................46 5 RESULTS AND DISCUSSION.................................................................................48 Human Researcher Access of ALS Data....................................................................48 Application Performance.....................................................................................49 Online Accessibility of Data...............................................................................50 Machine Access of ALS Data.....................................................................................51 Electronic Publication of Experimental Data.............................................................52 Future Directions of Research....................................................................................53 APPENDIX A ENTITY-RELATIONSHIP ALS DIAGRAM...........................................................55 B UNIFIED MODELING LANGUAGE ALS DIAGRAM..........................................57 C ALLSTAR USER EVALUATION............................................................................66 LIST OF REFERENCES...................................................................................................68 BIOGRAPHICAL SKETCH.............................................................................................71 vi

PAGE 7

LIST OF TABLES Table page 2-1. Partial excerpt from EXPERIMENT relation of ALS relational database................13 2-2. Entity descriptions of ALS relational database schema............................................14 2-3. Typical SQL-92 standard data types supported by RDBMS applications................16 4-1. Database queries in the ALS domain........................................................................39 A-1. Entity descriptions from existing ALS relational database schema.........................56 vii

PAGE 8

LIST OF FIGURES Figure page 1-1 Simplified material integration model of bioregenerative life support system..........3 1-2 Sample database query...............................................................................................7 2-1 Sample SQL relational database query including a join operation..........................13 2-2 Sample UML class diagram for two ALS object database classes..........................19 2-3 Object relationships in the ALS object database UML class diagram.....................20 2-4 Generating Java source code from UML class diagram...........................................22 3-1 Communication among three primary ALLSTAR components..............................26 3-2 Subsumption relationships among ontology language standards.............................27 3-3 Advanced and future roles for developed ALS ontology.........................................30 3-4 Web Taxonomy as an ontology visualization and editing tool................................33 3-5 Multiple levels of taxonomic classification in the ALS ontology............................34 4-1 ALS domain database query with object-based navigation.....................................38 4-2 Sample synonym lookup in ALS domain ontology.................................................41 4-3 Conventional tabular style of object database navigation........................................42 4-4 Graphical object database navigation.......................................................................42 4-5 Tabular display of object instance in database.........................................................43 4-6 Graphical display of growth chamber temperature data..........................................44 4-7 Communication among the RMI client/server application components..................47 A-1 Entity-relationship diagram for existing ALS database...........................................55 B-1 Complete UML diagram for ALS object database model........................................58 viii

PAGE 9

B-2 Expanded descriptions of individual UML class diagram components...................59 ix

PAGE 10

LIST OF ACRONYMS ALLSTAR Advanced life support/space and terrestrial research repository ALS Advanced life support ALSDMG ALS data management system ANN Artificial neural network BLSS Bioregenerative life support system BPC Biomass production chamber CEC Controlled environment chamber CELSS Controlled ecological life support system DAML DARPA agent markup language DARPA Defense advanced research projects agency DBMS Database management system DTD Data type definition GIS Geographical information system GUI Graphical user interface HTML Hypertext markup language JRE Java runtime environment JWS Java web start NASA National aeronautics and space administration ODBC Object database connectivity ODMG Object data management group x

PAGE 11

OWL Web ontology language OMG Object management group OQL Object query language OS Operating system RDBMS Relational database management system RDF(S) Resource description framework (schema) RMI Remote method invocation SGML Standardized general markup language SQL Structured query language UML Unified modeling language UNDACE Universal data acquisition and control engine W3C World wide web consortium XML(S) Extensible markup language (schema) xi

PAGE 12

Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Engineering APPLYING DATABASE AND ONTOLOGY DESIGN TECHNIQUES TO A NASA BIOLOGICAL RESEARCH REPOSITORY By Christopher Davidson August 2003 Chair: Howard Beck Major Department: Agricultural and Biological Engineering This thesis discusses the design and implementation of ALLSTAR (Advanced Life Support/Space and Terrestrial Research Repository), an Internet-accessible, object-oriented database application capable of facilitating access to biological experiment data subsets by both human researchers and automated search agents. For decades, researchers from the National Aeronautics and Space Administration (NASA) have conducted experiments exploring the feasibility of a bioregenerative life support system. Such a system could potentially extend the duration of manned spaceflight activitiesincluding future trips to the Moon, to the planet Mars, or even farther from Earth. Scientists at the Kennedy Space Center (KSC) in Florida involved with the Advanced Life Support (ALS) program have conducted hundreds of biological experiments; most studied plant development in closed-environment growth chambers. Over fifty million data observations have been recorded; unfortunately, the underlying xii

PAGE 13

relational database structure and the copious amount of data records produced have, over time, led to difficulties in accessing and evaluating the data set. The ALLSTAR application was developed with the intention of facilitating both human and machine readability of and access to the ALS data. Doing so required research components related to ALS domain knowledge analysis, construction of both an object-oriented database model and an ALS-specific domain ontology, and the development of a foundation for future development of ALS-related decision support system and predictive reasoning components that make use of the ALLSTAR data access model. Both the object database model and the domain ontology were successfully merged into a single application intended for ALS researcher use. An application evaluation was conducted and user opinions were collected. Suggestions for future research and design work related to the ALLSTAR project are also discussed. xiii

PAGE 14

CHAPTER 1 INTRODUCTION The core concept driving the development of the ALLSTAR (Advanced Life Support/Space and Terrestrial Research Repository) database application and research repository is the desire by scientific researchers to consolidate vast collections of data and knowledge into a single location. Common sense suggests that a more intuitively navigated database or user interface should enhance the accessibility of the underlying data records. Putting these ideas into practice, however, entails a need for user acceptance by a research staff or the general public. And if Simons ideas (Simon 1971) are literally interpreted, no specific model of database organization can remedy the fact that a user interface is a crutch necessitated by the inability to model and present knowledge in a universally understandable fashion. But the cautionary opinions of Simon and Ford (Ford 1998) probably did not take into consideration what is now thought to be the next fundamental advance in information technology: data and information analysis by machines, not people, to form highly organized knowledge bases from which logical conclusions can be drawn automatically about the natural world. Although computers are to blame for todays attention economy and many peoples high-velocity lives, it is computers that have the potential to ultimately relieve civilization of this burden of information overload. The computer science field of artificial intelligence is counting on database theorists and designers to help generate the vast, domain-specific knowledge reservoirs capable of fueling the first wave of truly intelligent software applications. 1

PAGE 15

2 Project Background The Advanced Life Support (ALS) program, a research team working with the National Aeronautics and Space Administration (NASA), has spent decades studying the possibility of growing plants in controlled environments to support human life as part of a bioregenerative life support system (BLSS). Earth, orbiting spacecraft, and even other planetary surfaces each pose unique growth environments and design considerations. The ALS research is poised to become one of the first of these aforementioned machine-understandable knowledge domains. Foresight on the part of ALS researchers at the Kennedy Space Center (KSC) in Florida resulted in archives of their biological experiment data dating back to 1986, comprising more than fifty million individual data observations that span more than 300 closed-chamber experiments. Current technology readiness levels do not feasibly allow the possibility of sending humans far into space, including such missions as NASAs well-publicized but indefinite goal to place a human crew on the planet Mars. To do so would require either significant advances in spacecraft rocket propulsion technology or advances in bioregenerative life support system development. Without either of these technologies, equivalent system mass (ESM) limitations are too prohibitive to consider sending one or more human crew members much farther than the Moons orbit around the Earth. The mass needed to provide adequate sustenance, water, fuel and shielding for internal crew quarters far exceeds plausible limits for spacecraft design (Wieland 1994). A BLSS would allow astronauts to recycle mass, such as exhaled CO 2 and metabolic waste, with the assistance of plants as both a renewable food source and an atmospheric revitalization component. A successful BLSS would mimic many life-cycle processes found on Earth (Figure 1-1): CO 2 /O 2 gas exchange between plants and humans, transpiration as a viable method of

PAGE 16

3 water purification and reclamation, and human metabolic waste processing and renewal with the assistance of selected plants (Wieland 1994). food water oxygen Human metabolic waste CO2 gray water nutrients and CO2 inedible material and O2 Microbial Bioreactor Biomass Production Figure 1-1. Simplified material integration model of bioregenerative life support system For years the ALS staff has been conducting controlled-environment plant growth experiments. The results obtained are typically used in the analysis of space and gravitational biology concepts, including the development and validation of crop growth models with respect to various parametric inputs: O 2 /CO 2 partial pressures, air temperature and relative humidity, ambient light levels, and soil/solution pH and electroconductivity, to name a few. Many such models seek to maximize the harvest index of the total biomass but minimize the growth area requirements to make the most effective use of a limited-space crop growth scenario for food source production. Successfully tested crops matching this qualification include wheat, lettuce, soybean and potatoes (Wheeler et al. 1996). Other candidate ALS crops are being considered for

PAGE 17

4 possible BLSS roles as metabolic waste processors for wastewater or gray water (human hygiene water) produced by a potential crew. Data Acquisition and Storage The current control system used by ALS researchers at KSC, the Universal Data Acquisition and Control Engine (UNDACE), provides control and monitoring capabilities for fifteen permanent controlled environment chambers (CECs), smaller growth chambers, and bench-top resource recovery experiments. Similar control and monitoring are also available for the Biomass Production Chamber (BPC), a cylindrical steel chamber (20 m 2 area, 113 m 3 volume) formerly used for hypobaric tests during NASAs 1958-1963 Mercury Project (Wheeler et al. 1996). UNDACE hardware for monitoring and control includes a Sun SPARCstation, Opto-22 digital and analog input/output boards, and all equipment required to route monitored sensor data to a central Oracle relational database. UNDACE was developed at KSC and remains the primary interface between operator and experiment (Strayer et al. 2002). Of particular interest to the ALLSTAR project are the manner and frequency with which raw data observations are recorded. The vast majority of experimental observations are stored automatically to relations in the Oracle database. The underlying relational database model itself is discussed further in Chapter 2. Limitations of Data Retrieval and Analysis The existing UNDACE system is adequate for monitoring ALS experiments in progress. Researcher-defined setpoint levelsvalues from which the real parameter measurement should not be allowed to statistically deviatefor parameters such as air temperature or CO 2 concentration can be checked against real-time sensor data in a timely fashion to allow environment adjustments and to ensure the integrity of an

PAGE 18

5 experiment. For purposes of post-experiment data analysis, however, the usefulness of the underlying relational database model (Appendix A) and its associated software-based interface tools (UNDACE) measurably falters. Data analysis typically involves a researcher importing subsets of ALS experiment data from Oracle (via an Oracle client or a Microsoft Access form/report) into another applicationExcel (2002), SPSS (2002), MATLAB (2002) or other programsto carry out statistical analysis of the data (ALS Survey 2003, Appendix C). As a result of using a relational database model, queriesquestions to be answered using the available data in the databaseare restricted to combinations of Structured Query Language (SQL) statements, which are not immediately intuitive in nature. Even relevant graphical user interface (GUI) tools must rely on SQL commands to carry out a database query. Although heralded in the mid-1970s as a significant achievement in database structure modeling, Codds proposed relational model (Codd 1970) carries some limitations as seen from the perspective of a scientific researcher trying to drill down into a large data set to retrieve a specific desired subset for analysis purposes. Reliance on primary and foreign keys to link relation entities introduces great potential for redundancy in an ill-designed database schema or in a database that has outgrown its own model structure. The existing ALS relational database falls into this latter category and has thus fallen victim to misuse and overextension of its intended capabilities, as explored in Chapter 2. Also, relational databases store only primitive data types (e.g., string/text, integer, floating-point number, date/time) with few exceptions. Lastly, the relational database model is not capable of attempting any type of semantic interpretation of its contents; nor can it easily express its contents as intuitively organized knowledge.

PAGE 19

6 Object-Oriented Database Design Despite the present-day widespread deployment of relational database management systems (RDBMSs), including Oracle Database (2003), SQL Server (2000) and DB2 Universal Database (2003) as a few large-scale commercially available examples, the more recent field of object-oriented database design continues to grow in popularity. Having itself arisen from frame-based knowledge representation (KR) approaches of decades past, object-oriented design drew serious attention as a KR paradigm in the 1990s and helped spawn several successful computer programming languages, most prominently Smalltalk, C++ and Java. It is believed that the evolution of the existing ALS relational data model to an object-oriented model (hereafter referred to as an object model) may increase the intuitiveness, utility and efficiency of any humanor machine-accessible interface attempting to browse, query or display specific data subsets. The first step in designing the ALLSTAR application was to analyze the underlying ALS relational database schema in addition to all ALS-related material being considered for inclusion in the new database. This material included ALS experiment protocols describing each potential experiment in detail before it is carried out, engineering specifications of sensors and other equipment used to monitor experimental data, and relevant bibliographic citations from ALS journal publications or technical memorandums in the public domain. Doing so allowed the construction of an object-oriented database schema using Unified Modeling Language (UML) design tools. The schema definition includes objects, their attributes, and their interrelationships. Owing to the formidable size of the existing ALS relational database (50,000,000+ records), the raw sensor data from experimental observations were kept in the relational entities, since

PAGE 20

7 relational databases are already well-suited for large collections of sequential time series measurements. The ALLSTAR application can therefore be classified as an object-oriented data layer atop a massive trove of relational data records. While the existing UNDACE tools permit strictly relational queries, ALLSTAR uses its object model as a primary interface to the contents of both the object and relational databases. Note that this technique is not equivalent to an object-relational database, which uses object-like data structures to partially extend the functionality of an existing relational data layer. One goal of the new object database model is to permit queries of a more intuitive nature. Instead of needing to know prior information about the relational database structurewhich data is in which relational entity, meanings of often cryptic attribute field descriptors, or familiarity with SQL commandsa researcher should be able to pose a query involving object attributes. To briefly show the differences between the two query approaches, consider the next example: a researcher desires a list of all ALS experiments conducted by primary investigator Gary Stutte that involved wheat as a candidate crop and seed positioning as a primary study type (Figure 1-2). select ID_EXPERIMENT from EXPERIMENT, ALS_CODES where TX_PRIMARY_INVST like %Stutte% and TX_CROP_PRF like %Wheat% and ALS_CODES.TX_DESCRIPTION like %seed positioning% and EXPERIMENT.CDGCS = ALSCODES.IDCODE A B select Experiment ID Code from Experiments where PrimaryInvestigator.lastName like Stutte and CropsInvolved.name like Wheat and StudyType.description like seed positioning Figure 1-2. Sample database query. A) Relational database SQL commands. B) Object Query Language pseudocode. The object-oriented approach takes advantage of the higher level of intuitiveness of its database schema to reduce or eliminate unnecessary join operations, and it may

PAGE 21

8 grant the researcher the ability to rapidly formulate more complex query constructions than were previously possible with relational entities and SQL alone. These query differences are examined in greater detail in the next chapter. Domain Ontology Development The final step in readying the ALLSTAR application for both humanand machine-readable content insertion is the development of an additional semantic information layer. This domain-level ontologyan explicitly defined, machine-understandable hierarchical specification of a shared conceptualizationis capable of interfacing with the object database layer and assisting with the semantic analysis of user queries (Gomez-Perez et al. 2002). Often implemented as a hierarchy of domain-specific vocabulary and interobject relationships, an ALS domain ontology attempts to capture as much background knowledge as possible about the ALS program, its experiments, its research direction and its inner workings. This ontology can also be thought of as a top-level layer of information that describes objects and behaviors common to the ALS program. More concisely stated, an ontology richly describes entities and concepts in a system. The presence of the ontology in the ALLSTAR application serves two primary purposes: database query assistance by means of a simplistic predictive reasoning algorithm that attempts to automatically classify incoming ALS search requests, and an interface for future contact by automated information search agentssometimes referred to as Web agents by proponents of the Semantic Web Activity (2001) overseen by the World Wide Web Consortium. Even a well-modeled object database is not best suited to store richly descriptive semantic information about its object contents; instead, descriptive tagging languagesmany of which have stemmed from the Extensible

PAGE 22

9 Markup Language (XML)are most frequently used in ontology construction and subsequent integration with a search/processing software application. Prominent examples of currently used ontology languages include the Resource Description Framework (RDF), Defense Advanced Research Projects Agency (DARPA) Agent Markup Language (DAML), and Ontology Inference Language/Layer (OIL). Often, similarities among languages due to their common origins allow their concurrent usage for ontology design: XML+RDF and DAML+OIL exemplify this trend. One roadblock preventing a more accelerated pace of development of machine-readable domain knowledge content appears to be the proliferation of these description languages and researchers subsequent hesitation in adopting any particular one for a project design. Each groups proponents tout the benefits of their respective language and encourage its refinement, but conflicting goals among languagessome attempt to capture a much finer level of detail about a domain than others, for examplehave thus far prevented any single predominant standard from emerging. In turn, software application developers have shied away from standardizing their domain content on only one descriptive language, often preferring to make their knowledge content available in multiple description language formats. To exacerbate this problem, no central registry yet exists for the deposition of completed ontologiestypically described using specialized schemas and made available as plaintext fileson the Internet or otherwise, for any of the above mentioned languages. A richly described ontology capable of storing actual object instances can work double duty as both a knowledge layer and an object database layer (Beck et al. 2002). However, no mature applications yet exist that combine this functionality with robust

PAGE 23

10 object storage performance and unfettered access to the underlying application programming interface. Although it is likely that future object-oriented database platforms will be based less on particular programming languages (e.g., C++, Java) and more on ontology languages, for now an ontology layer separate from the object storage layer is preferable. Project Goals of ALLSTAR The ALLSTAR application encompasses several separate objectives, each a component of the more far-reaching goal of accelerating the pace of BLSS research and development. Although ALLSTAR is intended as a single computer software application, its components are modular in design and can be improved upon individually. Each objective of the ALLSTAR application project can be described in either of two primary goal categories: assisting human readability of and access to the ALS data, and assisting machine readability of and access to the data. From a human perspective, ALLSTAR seeks to increase the intuitiveness levels of both the database schemawhich objects are relevant to ALS research and how they interactand database queries. Faster access to data subsets could allow an accelerated pace of BLSS equation development and feasibility testing by ALS researchers. Implementation of ALLSTAR on a platform-independent, Internet-accessible scale maximizes its usability in fields of Advanced Life Support, related academic research, K-12 education tools and general public use. Existing crop growth models and simulations can be validated using the ALS experimental plant biology data. And ALS researchers may be among the first to embrace direct online publication of scientific research and data, making their experimental data sets publicly available and in time foregoing the need to publish results solely in peer-reviewed or non-refereed scientific

PAGE 24

11 journals. Implications of ALS data availabilitys effects on experiment result publishing and discussion are given a closer examination in Chapter 5. Although few data mining applications yet exist that attempt to take advantage of multiple ontologies spanning interdisciplinary knowledge bases for logical reasoning or automated cross-classification of new terms, ALLSTAR also readies the ALS knowledge base for machine-implementable search and analysis algorithms. By expanding the original ALS relational data model to include contemporary methods of data modeling and presentation, development of additional tools for purposes of data visualization, statistical analysis and ultimately predictive reasoning can be facilitated. To sum up, development of ALLSTAR includes the design and implementation of the following: an ALS object model, an ALS domain ontology, and a prototype-level Internet-accessible computer program capable of integrating these components along with the existing ALS relational data.

PAGE 25

CHAPTER 2 OBJECT DATABASE DESIGN FOR ALLSTAR Relational database models date back to the 1970s and are well suited for storing large collections of sequential data, such as sensor readings of environmental parameters recorded every five minutes during a typical ALS crop growth experiment. What a relational model is not particularly adept at is representing complex interactions among its constituent data items, as these relationships are relegated to a flattened two-dimensional array of attribute-value pairs. A more intuitive approach to data modeling is the object-oriented knowledge representation paradigm. Although object-oriented languages date back to the late 1960s and early 1970s, they did not enjoy widespread acceptance and usage until the 1990s (Chaudhri et al. 1998). The first step in developing the ALLSTAR application involves the deconstruction of the existing ALS relational database and the subsequent production of a viable object model to act in its place as an interface between a human or machine client and the underlying data observations. Legacy Relational Schema of ALS The fundamental building block of relational data models is the relation (entity) element. Each named relation can possess a number of attributes and each data record (tuple) contains a single value per attribute. An example of an ALS database relation is excerpted in Table 2-1. The ALS relational database contains 18 relations useful for ALLSTAR inclusion, with content such as experiment descriptions, sensor readings taken every five minutes, and harvest biomass data. Each relational entity is related to at least one other by means 12

PAGE 26

13 Table 2-1. Partial excerpt from EXPERIMENT relation of ALS relational database ID_EXPERIMENT TX_CROP_PRF D_START_DATE D_END_DATE WP021 Potato 5/22/2002 7/3/2002 SB021 Soybean 6/19/2002 8/21/2002 LT023 Lettuce 6/24/2002 7/19/2002 of a primary/foreign key relationship; that is, a uniquely distinct field in one relation (such as an Experiment ID Code as shown in Table 2-1) serves as a matching identifier for a non-unique field in another relation. This relationship allows Structured Query Language (SQL) commands to perform join operations, permitting a query to logically connect data from separate relational entities. As a brief example, consider this sample relational query: a researcher wishes to extract a list of all cultivars (crop varieties) associated with ALS experiments conducted in or after the year 2001. Prior knowledge of which crops were studied (as opposed to cultivars, the desired results) in these experiments is not needed, as shown in Figure 2-1. select TX_CULTIVAR from CULTIVAR, EXPERIMENT where EXPERIMENT.D_START_DATE >= 01/01/2001 and CULTIVAR.TX_CROP = EXPERIMENT.TX_CROP_PRF Figure 2-1. Sample SQL relational database query including a join operation A more complete entity-relationship diagram (Chen 1976) for selected relations of the ALS database appears in Appendix A. A brief familiarization, however, with each of the relational entities is helpful and appears in Table 2-2. Owing to the limitations of modeling information using a relational schema, the ALS database contains significant pockets of redundancies. 25 of 283 records of the CULTIVAR_LIST relation described above, for instance, differ from other records by only one of three attribute values, an 8.8% redundancy rate stemming from the need to

PAGE 27

14 Table 2-2. Entity descriptions of ALS relational database schema ALS_CODES Experiment study types, grouped by concept ALS_PARAMETERS Environmental parameters: descriptions, abbreviations, measurement units AUTO_MEASUREMENT Sensor types, locations, parameters measured CHAMBER_HISTORY 5-minute-interval recorded sensor data (e.g., CO 2 ) from growth chambers during experiments CHAMBER_ORGANICS Periodic measurements of organic gaseous compounds (e.g., ethylene) during experiments CHAMBER_SP Parametric setpoints for each experiment, specific to a particular growth chamber CULTIVAR List of cultivars for each agricultural crop type CULTIVAR_LIST List of which cultivars were studied in a particular experiment DAILY_COMMENTS Periodic researcher comments about special actions taken (milestones, anomalies) during experiment EXPERIMENT Primary investigator, crops, start/end dates and descriptions of all ALS experiments EXPT_DETAIL Similar to EXPERIMENT, but includes multiple entries for separate tanks within growth chambers HARVEST Assigns unique ID code to each individual plant harvested HARVEST_COMMENTS Control/treatment description for each plant if relevant HARVEST_DETAILS Fresh and dry biomass measurements for harvested plant material, separated by location on plant LAB_HEADER Assigns unique ID code to hydroponic nutrient solution used in each tank of each experiment LAB_HISTORY Periodic measurements of chemical or nutrient solution parameter from each tank solution TANK_HISTORY 5-minute-interval recorded data (e.g., temperature) from nutrient solution tanks during experiments TANK_SP Parametric setpoints for each experiment, specific to a particular tank within a growth chamber separate the ALS experiment records by growth chamber over in the EXPERIMENT relation. More grave a concern is the relations use of a TX_CULTIVAR attribute that has over time been allowed to contain more than one cultivar name (e.g., McCall/Pixie), resulting in a loss of contextual meaning for this attribute and adversely affecting database searches. 62 such multiple-cultivar records exist, or 21.9% of the

PAGE 28

15 relation. Scenarios like these are identifiable throughout the ALS database, but are not unexpected. With regard to database schema evolution over time, it is often more convenient for database administrators to make minor adjustments to a slightly flawed design than perform database-wide adjustments which prompt all users and relevant software applications to forcibly conform to any significant new changes. Existing Methods of Database Access The existing ALS experimental data is stored in an Oracle relational database and is most frequently accessed by either an Oracle client program or a Microsoft Access form/query via Microsofts Open Database Connectivity (ODBC) driver. Database programmers with the Dynamac Corporationa subcontractor of NASA involved with ALS researchcreated an interconnected system of 140 queries and 97 forms (written in Visual Basic) for Microsoft Access, collectively referred to here as the ALS Data Management System (ALSDMG). The system permits ALS researchers to query the entire relational dataset by any of a variety of field types using search keywords or numerical range statements. Although the ALSDMG partially precludes the need for an ALS researcher to know the inner workings of the databaseits schema structure, field names and coded abbreviationsit replaces the low-level database access with a maze of user input interfaces often dissimilar in appearance, haphazardly arranged with respect to their underlying data records and not readily customizable. Once an ALS researcher becomes accustomed to the ALSDMG interface, it can be more efficiently traversed and manipulated. The learning curve for novice users, however, may be substantialin part due not to lack of programming finesse but to the modeling limitations imposed by the relational database structure and design (ALS Survey 2003).

PAGE 29

16 Object-Oriented Database Design Two advantages of object design are the transparency of the underlying database storage/access mechanisms, and the ability to model a system with higher intuitiveness levels inherent in its class structure. Many programmers want the ability to permanently store objects created with object-oriented languagesa process known as object persistencebut do not want to deal with a database management system (DBMS) that is separate from the programming environment (Loomis 1995). In addition, most relational models maintain the severe handicap that only data belonging to a fixed set of data types (Table 2-3) can be faithfully represented. An exception is a database using the newest SQL3 standard, which provides for limited object-type extensions via object-relational definitions. Object-relational standards such as SQL3 are offered as ways to improve database schemata without having to migrate to a fully object-oriented platform, but the programming aspects of the ALLSTAR project benefit greatly from the decision to use a Table 2-3. Typical SQL-92 standard data types supported by RDBMS applications Data type Example CHARACTER/VARCHAR (text) carrot SMALLINT (short) 32,766 INTEGER (long) 1,300,455 REAL (single) 525,000,000 DOUBLE PRECISION (double) 615,100,000,000,000 FLOAT 46.1726435 DATE 03/07/2005 TIME 19:25:00 TIMESTAMP 9/12/1990 11:55:00 completely object-oriented schema, as discussed in Chapter 4. Precision and scale values of numerical data types are set by the DBMS and often vary across platforms.

PAGE 30

17 Objects stored in an object database can be of any data type, including those substantiated by user-customized object class type definitions. This flexibility in modeling is what allows a database designer to construct a more naturally understandable schema to represent real-world knowledge. Instead of a flat-file, tabular arrangement of rows and columns in a relational entity, data records become values of attributes assigned to individual object instances persistently stored in memory. Object Model Development Staff interviews, background literature, and past ALS journal publications were analyzed for potential starting points for the new object model design. The existing ALS relational database was also scrutinized, revealing additional trends in data collection and usage that steered the model toward a more complete representation of ALS knowledge. Several database entities lent themselves to immediate consideration as new object types: experiments, researchers, growth chambers, nutrient solution tanks and plants. These new objects formed the foundation of what would later evolve into the completed object model. As an example, one published study discussed the effects of different lighting sources on potato growth, which in turn led to the consideration of Light Source as a new candidate database object. Although semi-automated text parsers used in conjunction with electronic versions of publications may assist in identifying key phrases of interest, the modeling process remains largely manual throughout most of its design phases. Incremental evolution of the object model, along with ongoing analysis of the logic connecting objects to each other, cannot yet be conducted in an automated fashion. Without the ability to draw upon existing expert knowledgein this case provided by ALS researchers already familiar

PAGE 31

18 with their own experimental datait would be a substantially more difficult task to accurately model domain information in a useful manner. Object Modeling with UML After identifying several ALS-domain objects (e.g., experiments, researchers, growth chambers) to serve as starting points for the database schema, the task of selecting an appropriate modeling language (not to be confused with the application programming language, discussed in Chapter 4) needs to be settled. Standardized usage in the database modeling industry influenced the eventual choice of the Unified Modeling Language (OMG 2003). UML is the result of collaborators from the Object Management Group, a non-profit, open consortium of companies who seek methods of standardizing object-based software for purposes of hastening its public acceptance and use (OMG 2003). Many open-source and commercial UML integrated development environments abound. The choice of ObjectStore PSE Pro (ObjectStore 1999) again reflects previous Agricultural and Biological Engineering Department experience with this suite of software tools. Construction of a UML class diagramwith the intention of capturing primary ALS-related objects and their interrelationshipsinvolves assigning string-based names to each object class and defining attributes of and connections between each. Unabridged class diagrams include the object class name, names and data value types of each attribute, and names and (expected return) data types of each relationship method (OMG 2003). Appendix B includes a complete UML diagram for the ALS knowledge domain; for now, an excerpt appears in Figure 2-2. Figure 2-3 is intended only to demonstrate the potentially complex object interrelationships, shown by paths connecting each related object in the class diagram.

PAGE 32

19 Figure 2-2. Sample UML class diagram for two ALS object database classes Diagram details are available in Appendix B. The complexity of the figure also serves as a reminder that a UML-based object model can be equally as difficult to portray all at once as its relational predecessor. A closer inspection of the diagram may reveal UMLs modeling advantages with respect to ease of schema visualization. Each relationship (line) connecting two classes in Figure 2-3 represents the attribute value of one class referencing another user-defined (non-primitive) class. For instance, consider trying to model the following concept: a cultivar is associated with one (and only one) type of crop. The Cultivar class from Figure 2-2 has an attribute name of data type String, which is a primitive-level collection of text characters. But the ofCropType attribute is of type Crop; in other words, the value of ofCropType is not a single numerical or string value. The value is instead an instance of another class (Crop) found elsewhere in the UML diagram. This UML schema therefore attempts to model this sentence as plainly as possible: A Cultivar is an object or concept whose ofCropType attribute is of object type Crop. Class Relationships in UML UML class diagrams allow clustering of like concepts by means of several basic types of relationships, many of which are visible as linear paths in Figure 2-3.

PAGE 33

20 Figure 2-3. Object relationships in the ALS object database UML class diagram Superclass and Subclass Classes can inherit attributes and behaviors from parent (superclass) objects. Extensible polymorphism allows classes to define new attributes for themselves (and their subclasses) and overload identically named attributes and behaviors with updated

PAGE 34

21 definitions. An ALS example: a Biomass Production Chamber is a subclass of Growth Chamber. Array dimensionality Object arrays allow UML classes to request either one or many object instances of each defined attribute or behavioral method. One-to-one, one-to-many and many-to-many relationships are crucial to a schema designers ability to accurately portray an information domain with UML diagramming tools. From the ALS database: each Experiment involves one or more (i.e., many) Researchers. Containment Especially difficult to model with relational flat-file schema design tools, containment elegantly encompasses those interobject relationships often described as contains, has or is made of in object models (Loomis 1995). For instance: each Growth Chamber contains one or more Nutrient Solution Tanks. Java Class Generation As previously mentioned, UML was selected as the object modeling platform because of the ease with which its class structure can be transferred directly to object-oriented source code. Likewise, Java (2003) was selected as the application programming language due primarily to its bytecode execution method allowing operating system platform independence, and its proven suitability for development of an Internet-ready application. The process of generating Java source code files from the initial UML diagram is straightforward. ObjectStore handles the conversions completely, transforming each UML class into its own Java source file of the same name. For example, the Researcher and Experiment classes have now become Researcher.java and Experiment.java, as

PAGE 35

22 public class Researcher { public Researcher() {. . } } public class Experiment { public Experiment() {. . } } Figure 2-4. Generating Java source code from UML class diagram pictured in Figure 2-4. All new Java objects arrive in the world with their attributes and interobject relationships intact. Implementing Object Persistence The creation of Java source code alone does not allow Java objects to be persistently stored in permanent memory. A typical wish is to store these objects to a re-writable, non-volatile storage medium such as a hard disk. As with most programming languages, variables and objects created at the time of program execution last only as long as the program remains in the computers physical memory (i.e., RAM). Once the program completes its execution, all values of variables and instantiated objects are lost and then garbage-collected. Implementing object persistence for the newly created Java class types requires a post-processor, generally provided by the object database vendor. Post-processing the compiled binary Java files adds provisions for serializable object transport. In other words, class instances can now be written to memory and stored independently of a programs execution. This seemingly transparent object storage layer is referred to as the object database. Note that the choice of object database platforms ties the designer to a particular object-oriented programming language. Since this project relied on a Java/ObjectStore PSE Pro combination to begin with, switching programming

PAGE 36

23 languages (e.g., C++, Smalltalk) is not advisable, nor is altering the object database vendor (ObjectStore PSE Pro) without compromising any data instances already housed in the object database. The database schema itself, however, can be moved to another application platform if necessary. Here it is important to note once more the difference between object and relational database persistence requirements. Relational databases are seen by designers as stand-alone entities, capable of being accessed by various programming languages using the common SQL standard. They are already persistent by nature, as they are composed of fixed tables of data records and require an SQL-based interface to retrieve their set-based contents. Object databases, on the other hand, require no transformation code to reconstruct their contents. Persistent objects, once located in the object database, can be acted upon just like any other program object (Loomis 1995). This feature eliminates the need for an additional data access layer between queries and objects. Now that the ALS object database has been properly modeled and implemented in native Java code, the task of generating an even higher-level layer of domain knowledge can be approached, as discussed in Chapter 3.

PAGE 37

CHAPTER 3 ONTOLOGY DEVELOPMENT FOR ALS One motivation for developing an ALS-specific ontologyan explicitly defined, machine-understandable hierarchical specification of a shared conceptualizationstems from the ALS research teams desire to exercise greater control over their experimental data. By constructing an ALS domain ontology, it may be possible to capture enough background information about ALS research to allow both the ALLSTAR application and future software development efforts the chance to assist ALS researchers in examining their own data from perspectives not previously possible. It is believed that the amount of data being collected by NASA research teams now exceeds the ability of the professionals in the field to process, manage and study (Campbell 1987). The ALS plant biology experiments are manageable in scopefar more so than other NASA-related data sets such as 2-D geographical information system (GIS) earth science plots continually generated by satellite observationsbut are not able to be fully analyzed due to their complexity of structure and parametric dimensionality (ALS Survey 2003). An ALS domain ontology can serve as a tool to enhance database queries and promote future efforts involving machine readability of its extensive underlying experimental data. What is an Ontology? The term ontology was borrowed by computer scientists from the field of philosophical metaphysics, where it originally referred to a theory of organization of entities and their ties within a distinct system. Although there is no single way of 24

PAGE 38

25 organizing concepts within an information domain, an ontology attempts to specify a common vocabulary between different systems (Beck et al. 2002). An ontology accomplishes its task by means of establishing a formal (i.e., machine-readable) hierarchical structure of vocabulary terms associated with a particular domain. The ALLSTAR project, for example, seeks the development of a domain ontology, which restricts itself to only those concepts, theories and vocabularies specific to a particular domain, the ALS research involving BLSS components and development. If interpreting documents such as ALS experiment data or result findings requires prior knowledge to understand their contents, the ALS domain ontology can be considered as the background knowledge (Euzenat 2002). Although official definitions vary, most ontologies share the following characteristics: shared (accessible across multiple platforms and disciplines of knowledge), formal (readable by both humans and machines), and explicit (follows strict tenets of at least one standardized ontology language, as discussed shortly). Knowledge in an ontology is typically organized in a taxonomic fashion and specified using five general components: concepts, relations, functions, axioms and instances. The ALS domain ontology used in this study, however, leaves the task of object instance storage to the object database discussed in Chapter 2. Figure 3-1 shows the relationship among the ALLSTAR application components discussed thus far. Ontology Language Selection Several prominent description languages suitable for ontology development presently exist. Although most of these languages are similar in syntax and purpose, each represents a different preference for the level of detail with which knowledge should be represented. Regardless, all ontology languages parallel object-oriented ideologies

PAGE 39

26 Domain Ontology Relational Database Object Database Figure 3-1. Communication among three primary ALLSTAR components mandating extensibility through resource sharing. It is likely that these standards will evolve rapidly with time, possibly merging into fewer resulting languages as their usage increases. Unified Modeling Language (UML) In 1997, the Object Management Group (OMG) released the UML 1.0 specification, which has since undergone several revisions and version updates. As discussed in Chapter 2, UML usage is highly appropriate when it is the designers intention to map UML classes directly to programming code or object database elements. UML draws criticism, however, from its use in representing more formal models such as ontologies. These opinions are due in part to UMLs lack of contextual associations (properties) and concern for computational complexity of runtime reasoning (Kogut et al. 2002). UML is capable of modeling object class properties; however, it does not currently allow interobject relationshipsa plant grows leavesto carry over to other objects with the same relationship namea flower grows petals. Each instance of the relationship must be independently stored and created, and no simple mechanism exists to override this limitation. Current OMG research into an improved UML specification, based on their Model-Driven Architecture (MDA) standards, may remedy this shortcoming and allow future UML versions to gain ground as a viable ontology

PAGE 40

27 description and modeling language. In the meantime, however, the aforementioned limitations combined with a lack of support for multiple inheritance (i.e., a class cannot have more than one immediate superclass, a problem for modeling even such simple concepts as Carbon dioxide is a molecular structure and a measurable ALS parameter of interest) have ruled out UML as the ontology language of choice for the ALLSTAR database application project. Resource Description Framework (RDF) RDF (2001), as shown in Figure 3-2, is a subset application of the Extensible Markup Language (XML) standard and intended as a foundation for processing metadatadata that describes other data. RDF Schema (RDFS) is analogous to the XML Schema research seeking schema-specific standardization of RDF-encoded information. Likewise published by the W3C working group, RDF requires its metadata authors to designate at least one underlying schema that the ontology makes initial reference to. These underlying schemata can then be shared with other designers and extended upon in hopes of ultimately constructing useful, trusted, sharable sources of domain-specific knowledge (Lassila 1998). OIL DAML RDF(S) HTML XML SGML Figure 3-2. Subsumption relationships among ontology language standards

PAGE 41

28 Previous to RDFs release, the namespace of attribute names and the structure of its values went uncontrolled, easily allowing two designers to inadvertently assign non-identical names and value types to what should be identically modeled concepts. For example, the ALS model may include a Crop class as having an age attribute of type FloatingPointNumber, but perhaps another research team models an AgriculturalCrop class as having a daysAfterPlanting attribute of type Integer. Both approaches are valid, but the two models cannot easily communicate without manual intervention to map related classes to one another. RDF, although not requiring use of a central namespace registry, hopes to facilitate interaction between XML+RDF document designers and encourages extensibility of existing domain models to promote standardization of class/attribute pairs (Lassila 1998). The DARPA Agent Markup Language (DAML) The Defense Advanced Research Projects Agency (DARPA) began the DAML project (van Harmelen et al. 2001) with the intention of facilitating the W3C working groups vision of a future Semantic Web, essentially a gradual transformation of Internet-accessible contentweb sites, multimedia, embedded knowledgefrom the present HTML standard to one of the more descriptive standards derived from XML. A major extension of function DAML adds to its XML+RDF foundation is its ability to derive logical conclusions not explicitly stated in the DAML document itself. Allowing the equivalence of identifier terms (e.g., ageOfCrop and daysAfterPlanting) along with distinction of unique terms furthers the goal of DAML to provide additional expressiveness over its predecessors (Heflin et al. 2001).

PAGE 42

29 Ontology Inference Layer (OIL) The OIL description language specification (2000) was released by On-To-Knowledge, another consortium of university and industry researchers bent on facilitating computerized information exchange. Rather than an attempt at consolidating previous language efforts, OIL comprises multiple layers of ontology design and understanding specific to Web-enabled access to and sharing of contextual content. Each successive information layer is accessible by a specific OIL application, but the design is such that lower OIL levelsthose with less content description and expressivenesscan be at least partially understood by any OIL-ready application. As with DAML, XML+RDF(S) is the core foundation of OIL. The layered architecture of OIL paired with its similarity to DAML project efforts ultimately led to a merger of modeling concepts, called DAML+OIL (sometimes referred to simply as DAML). In turn, DAML+OIL became the foundation for yet another W3C working group revision, the Web Ontology Language (OWL), currently under development. Advanced Ontology Operations Modeling knowledge in the ALS domain is only the first of several future roles intended for the ALS ontology. Figure 3-3 illustrates the pyramidal hierarchy of ALS ontology development leading from its initial knowledge model, which is completed, to its eventual acceptance as a trustworthy source of scientifically proven information suitable for cross-disciplinary machine learning usage (Li 2002). Domain logic can be derived either from statements expressed in a specialized ontology language (e.g., DAML+OIL) or by means of an application capable of extrapolating logical statements from an underlying semantic organization. Ideally, any ontology would be concise enough to allow its full set of inherent logic to be derived

PAGE 43

30 Trust Proof Logic ALS Ontology Vocabulary Figure 3-3. Advanced and future roles for developed ALS ontology while minimizing redundancy within the ontology itself. An ALS-specific example might be a program capable of automatically classifying new experiments based on their known characteristics and attributes. A richly expressed ALS ontology and its companion object data set allow for the possibility of automatic theorem learning via traditional machine learning techniques: backpropagation-based artificial neural network (ANN) algorithms, inductive inference by decision tree analysis (e.g., C4.5, ID3), and non-parametric statistical techniques for histogram data sets which cannot be assumed to follow a Gaussian (normal) distribution. Through methods such as these, the ALS ontology and others can provide a basis for future development of accelerated automation of proof testing and learning. For example, an ALS-related program might suggest crop growth model equations based on previously observed plant responses to controlled input parameters. Regardless of the level of detail of an ontology, its contents may fall under heavy scrutiny if it originates from a questionable source. Evolving standards in digital signatures and assigned statistical probabilities of individual constituent knowledge components will likely be the forerunners in the tools used to convince the scientific community and general public of the validity and usefulness of an ontologys modeled knowledge. Todays search engines and site indices, for example, cannot limit searches to

PAGE 44

31 trusted data, nor can they attempt to understand the context or meaning of the data they scan (Lopatenko 2001). Familiarity with the Internets propensity for rapidly spreading ambiguous or false informationnews rumors, urban legends, celebrity gossipshould hopefully deter researchers from too quickly accepting results returned from conventional search engine queries. Scientific literature and research traditionally have not been viewed as fields rife with falsely reported results or malicious self-interest; however, the proliferation of electronic resources such as domain ontologies available on the Internet could be seen as potential breeding grounds for a host of ill-intended information sources to reside in. The respected scientific process of peer review would falter if trust and certainty levels cannot be successfully established for ontology-based knowledge sources. An equally unfortunate scenario would be if multiple untrusted information sources lead to limitation of their access and researchers subsequent consensus on flawed concepts, having not been able to adequately share their research with the scientific community at large. Implementation of ALS Ontology Although all of the previously mentioned ontology description languages are non-proprietary open standards, academia and industry have yet to agree on which tools to standardize upon for production environments. Since the success of an ontology standard somewhat broadly depends on its general acceptance and usage, the jury remains out on which family of description languagesif anywill in time prevail. Not everyone is convinced that the Semantic Web is a viable goal, however. Critics point out that in order for the idea to proliferate, independent researchers and ultimately the general public will need to retrofit their existing HTML or replace it completely to allow true semantic content analysis by a machine. These computerized Web Agents likely will not feel at

PAGE 45

32 home on the now-dominant HTML portions of the Internet, which are designed for interpretation by humans. Maximizing machine readability of data may come at the cost of sacrificing oft-abused HTML markup entirely as a means of Web-based document storage and communication (Clark 2003). As the ALS ontology is primarily intended to serve as a source of assistance for database queries (by humans or machines) and only as a starting point for a complete domain-level ontology later suitable for interdisciplinary research sharing, no particular ontology language thus described was an obvious favorite. This observation furthered the selection of an in-house departmental UML-like design environment, Web Taxonomy (IFAS Information Technologies, University of Florida, 2000), for the ontology construction. The design environment is capable of XML input/output (for cross-platform data transport if required) and is able to serve as a graphical ontology navigation and query tool. Using generic language tools instead of one of the still-evolving ontology language standards may help lengthen the shelf life of the ontology and prolong its usefulness for future software development and information sharing efforts by the ALS research staff. In a similar but more exhaustive procedure than the object database design discussed in Chapter 2, creating an ontology requires meticulous consideration of the types of objects (classes) to be included in the modeled ALS information domain. A practical starting point was to use the two dozen classes already defined in the ALS object database (e.g., Experiment, Crop, Researcher) as a foundation to build upon. ALS-related publicationsexperiment protocols, results, technical memorandums, and

PAGE 46

33 equipment descriptionsand interviews with ALS staff researchers provided the remaining details to be embedded within the ontology schema. The manual classification and insertion of the collected ALS terms included consideration of all conceivable synonyms or alternate phrasings. For instance, LED, light-emitting diode, and LED lamp all describe the same ALS concept. The graphical depiction of ontology terms as a hierarchical grouping of related concepts was made possible by the Web Taxonomy application itself. This mode of direct visualization helped facilitate the rapid construction of ontology branches, as illustrated in Figure 3-4. Figure 3-4. Web Taxonomy as an ontology visualization and editing tool Likewise, multiple hierarchical levels of taxonomic classification allow terms in the new ALS ontology to more naturally represent real-world concepts. Navigation along the subsumption (superclass/subclass) relationships between terms is straightforward, as depicted in Figure 3-5. For the purpose of the ALLSTAR database application project, the ALS ontology can be used to yield a set of documents derived from either the underlying relational or object databases. Future ontology analysis layers, however, such as the proof, logic and trust modules discussed earlier, may allow the ALS ontology to provide a trusted answer to a natural-language query posed by a human or machine client.

PAGE 47

34 Figure 3-5. Multiple levels of taxonomic classification in the ALS ontology

PAGE 48

CHAPTER 4 ALLSTAR APPLICATION SOFTWARE DEVELOPMENT With the ALS object database model and domain-level ontology completed, the next step in the development of the ALLSTAR project was the design and implementation of a prototype application capable of storing, accessing, querying and visualizing the combination of data sources in an understandable and useful fashion. As the object database model was written using UML tools, it made sense to choose a corresponding object-oriented programming language to uphold the desired transparency between the application and the underlying database. Between C++ and Java, the two most prominent object-oriented application programming languages presently available, Java was chosen for its demonstrated ease of developing Internet-ready applications and its cross-platform execution ability. Graphical user interface (GUI) and data visualization components were constructed using Javas Swing class library, client/server interaction was implemented via Javas Remote Method Invocation (RMI) library, and deployment integrity assurance was handled by Java Web Start (JWS). Each of these topics is discussed separately below. Application Design Considerations Software engineering guidelines predict that the more time that is spent on the design phase of an application project, the less time is needed in its final adjustments and maintenance phases. A number of design approaches were attempted and subsequently discarded; therefore, the current ALLSTAR version reflects only those programming concepts and algorithms best suited for integrating the primary goals of the project: 35

PAGE 49

36 expanding the capability for human researcher access to the ALS data through a new query engine, and establishing a sound foundation for future development efforts related to machine readability and knowledge-sharing capability of the ALS information domain. Query Processing The Java source code classes created from the UML schema as discussed in Chapter 2 have no interobject query capabilities of their own. Instead, they rely on whichever object query language (OQL) implementation that has been established by the vendor-specific object database software. In this case, the ALS object database is served by the ObjectStore PSE Pro variant of OQL. It overlaps and is therefore said to be compliant with the object storage industry standards promoted by the Object Data Management Group (ODMG), of which ObjectStores parent company is a member. It should be noted that the similarities among query languages of different object database platforms indicate a smooth transition if ever required, rendering negligible the concern for maintaining a non-proprietary solution whenever possible. The default set of query tools included with the object database software required the manual programmatic extension of its capabilities to meet the ALLSTAR application goals; specifically, ALLSTAR needed to use the attributes and relationships of each class as navigational tools for graphical query construction. Also desirable was the ability to dynamically filter out unwanted attributes specific only to the internal object storage mechanisms and not needed for display purposes. A final feature sought was the utility of being able to string together limitless object queries of various types, a cornerstone ALLSTAR feature made possible only by the evolution from the legacy relational database to the newly developed object model. To implement these improvements to the default query tools, the required subroutines were coded in a previously undefined ALS

PAGE 50

37 domain object, arbitrarily called RootObject. By then modifying the original object data model, each ALS object class can trace its superclass roots back to RootObject along its inheritance path. Doing so allowed any newly defined ALS objects to inherit behavior provided by the customized query processing engine. If these improvements were instead implemented in the main executable code sections, individual ALS object instances would be deprived the ability to attach themselves to a query-in-progress by examining their own contentsattribute/value pairs of any data type, primitive or user-defined. For an ALLSTAR client, a database query can now appear to be a transparent layer atop movement (either in tabular or graphical format, as explained in a later section) through the structure of the database itself. A user may associate this navigation with zooming inor drilling downfrom a general concept to a specific data subset sought for analysis. With the exception of the information contained in the relational database (e.g., growth chamber measurements, nutrient solution measurements, harvest data), the object database contents become the data of interest. Instead of flat-file relational entities, results of a database query now entail viewing the properties and relationships of one or more objects that match a specified search term or concept. Consider the query example from Chapter 1, Figure 1-3: again a researcher wishes to extract a list of all ALS experiments conducted by primary investigator Gary Stutte that involved wheat as a candidate crop and seed positioning as a primary study type. With the new query tools in place, however, prior knowledge about the structure of the object model is not needed. Instead, a researcher can make use of the semantic model embedded within the object classes to intuitively navigate to an answer (Figure 4-1).

PAGE 51

38 Since many ALS-domain objects contain attribute values of non-primitive types (i.e., not concrete data types such as string literals, integers, dates or floating point numbers), the resultant interobject database navigation often yields additional objects of relevance or interest to an ALLSTAR user. Previous queries based on the legacy -and-andCrop commonName cultivars . Find: wheat Find: seed positioning Find: Stutte Chamber Study Type description . Researcher firstName lastName emailAddress workPhone department . Experiment primaryInvestigators description startDate endDate cropTypesStudied studyTypes . Figure 4-1. ALS domain database query with object-based navigation relational database are restricted to requests for information based only on the relational entitys attribute names and data ranges. Object-based queries empower a user with the ability to pose any series of questions using keyword matches related to the object model and its higher level of intuitiveness. To consider the usefulness of object-based queries, a brief list of sample queries is shown in Table 4-1. Table 4-1 illustrates several key concepts. First, the object query suffers the limitation of not being able to query the relational data, as it is stored separately and is accessible only via traditional SQL queries. Data-specific information such as particular

PAGE 52

39 numerical ranges of oxygen pressure measurements cannot be included in an object query. Relational data is accessible but not queryable using the ALLSTAR program. Only objects in the object databaseand thus defined in the object modelcan be included in an object query, revealing a considerable limitation of the decision to keep some ALS data separate in its original relational form. This obstacle prevents, for instance, the ALLSTAR application from being able to process the last query listed in Table 4-1. Database queries in the ALS domain Sample query 1 Implementation Method Find all ALS crop growth experiments involving wheat. Relational or object Find wheat experiments in which harvested plants exceeded 0.5 m in height. Relational Find experiments whose primary investigator(s) published journal article(s) involving potato growth Object Compare the gas exchange measurements between the two levels of the BPC 2 for lettuce experiments conducted in both levels simultaneously Relational or object Determine a correlation coefficient between the variance of an experiments recorded harvest data and the average annual number of journal publications written by the experiments primary investigator Neither Table 4-1. Despite the unconventional nature of this query and its likely irrelevance with regard to ALS experiment analysis, the importance of being able to formulate arbitrary database queries is here again emphasized. Traditional query types should not be allowed to wholly dictate the ability of the query engine. Future efforts in automated theorem proving can take advantage of this query flexibility and possibly demonstrate unintuitive 1 Note that processing of natural-language syntax queries is not supported in the current ALLSTAR version. Keyword pattern matching is used. 2 Biomass Production Chamber. The BPC is the centerpiece of the ALS plant biology experiments at the Kennedy Space Center.

PAGE 53

40 correlations among seemingly unrelated components. Tilting at windmills, chasing rabbits, and other indelicate descriptions of placing undue emphasis on irrelevant correlations discovered in small data sets remain a problem best suited for human researchers to sift through. Several ALS staff researchers have indicated concern regarding ALLSTARs potential for generating a wealth of tangential suggestions related to data set correlations, all of which would need to be analyzed further if even one can be ultimately proven correct (ALS Survey 2003). Uncertainty analysis measurements of completed ALS experiments can now be integrated with additional ALS-domain knowledgemanufacturer specifications of sensors, growth chambers and nutrient solution tanks, for examplemade possible by the flexibility of the underlying object query processor. As discussed in Chapter 3, the ALS domain ontology serves two roles: a schema providing data types for annotation of existing content, and a semantic context providing background knowledge of a specific domain (Euzenat 2002). The first of these roles holds direct significance to the ALLSTAR query processor; the second role serves only as a backdrop for future ALS-related software development. When a userhuman or machineinputs one or more search terms, the contents of both the domain ontology and object database can be iteratively searched for matching patterns. Should a term match the name of an object database class, the user is directed to the database node containing the result object type(s). If no database match is found, the ontology then steps in as a reserve source of information to further the query. Synonyms, partial keyword matches and related terms can all be analyzed for possible matches with existing object model classes. If still no matches are found, the user is encouraged to browse further for a

PAGE 54

41 possible link to the information sought. This approach seeks to minimize the amount of background informationi.e., expert knowledgea user must expect to exercise during a typical database search-and-retrieval operation. To briefly illustrate one potential aspect of this query assistance approach, consider the hypothetical synonym lookup example shown in Figure 4-2. Object Database ALS Domain Ontology Found: light-emitting diode Synonym: LED lamp Synonym: LED light-emitting diode No matches found. LED lamp No matches found. LED 8 Protocol matches 10 Parameter matches 6 Journal Article matches 56 Experiment matches Figure 4-2. Sample synonym lookup in ALS domain ontology Database Navigation Two primary modes of database traversal have been included in the ALLSTAR prototype application: a more conventional tabular style, and a somewhat experimental graphical navigation system. Screenshots are provided in Figures 4-3 and 4-4. Aesthetics of both methods obviously vary, and while the ALLSTAR application is not intended as an exercise in commercial-quality GUI design, advantages of each

PAGE 55

42 Figure 4-3. Conventional tabular style of object database navigation Figure 4-4. Graphical object database navigation navigation method can nonetheless be intuited. The tabular navigation style ensures that all relevant attribute data (every existing attribute value) will be displayable inside a single on-screen container object. A user should therefore expect little trouble searching for a specific value or range, as each attribute value list is alphanumerically sorted, using

PAGE 56

43 a modified Quicksort algorithm (Hoare 1962) to enhance application performance. Users choosing the graphical navigation style, however, cannot be assured of the on-screen location of their sought attributes or values, since these locations depend on the components of the query already in progress. The strength of the graphical method of database traversal lies in its ability to allow at-once visualization of a larger portion of the object database model and its object interrelationships. Studies of human cognitive capacity suggest that a persons task analysis abilities can be enhanced by grouping relevant informationlogically related database objects, in this caseand that this visual grouping can decrease the time spent understanding a query in progress (Lee 1993). On-Screen Visualization of Data In an effort to minimize the potential confusion of navigating an object database, object instances are shown by class name and attribute/value pairs. To decrease screen clutter, relationships are not explicitly shown. Clicking on an ALS-domain attribute value expands the view to include the new object(s) of interest. Figure 4-5 shows a database object instanceof class type Parameterillustrating its tabular view of attributes and corresponding values. To show the potential for an integrated Java application capable of object database access and relational data display, ALLSTAR allows graphical and tabular displays of Figure 4-5. Tabular display of object instance in database

PAGE 57

44 relational data as found in the parametric sensor measurements of each growth chamber and nutrient solution tank. The use of JFreeChart (2003), an open-source Java class library of charting and graphing tools, as a vehicle of graphical data display further exemplifies Javas ease of class extensibility. The ability to include third-party source code libraries when appropriate and their subsequent integration into the project code typifies Java programming projects such as ALLSTAR. Figure 4-6 briefly shows a graphical view of ALS growth chamber temperature measurements, using the integrated JFreeChart library. Figure 4-6. Graphical display of growth chamber temperature data Relational data from the ALS databaseincluding 5-minute growth chamber sensor readings, nutrient tank readings, and harvest datacan also be exported from ALLSTAR in a tab-delimited format for use in external applications if higher-level analysis is needed. Likewise, visual graph plots can be saved as image files. All graphs allow threshold selection via mouse-based zoom controls. By default, daily average values are initially displayed for 5-minute sensor data due to database size considerations.

PAGE 58

45 Graphical User Interface Considerations From a human researchers perspective, the ALLSTAR prototype application can maximize its usefulness by allowing different modes of access to every data recordobject or relational, single-record or data-subsetand by displaying multiple simultaneous conceptual relations or records. Doing so allows researchers to get their hands around the data and results, potentially decreasing the time necessary to retrieve pertinent results from the database itself (DeCoste 2001). The ALLSTAR application uses customized extensions of standard Java Swing components for its GUI construction, foregoing the need to build display components and object containers completely from scratch. Application Deployment Issues The Java programming language presents its own unique set of deployment issues necessary for consideration. Originally, the ALLSTAR project started out as a simple applet. Over time, however, it outgrew the applet sandboxthe program execution restrictions imposed upon all appletsand required a shift to become a full-fledged Java application. Security Issues By default, all Java programs that did not originate from a users own computer are forbidden from carrying out many system-related tasks: disk read/write access, cut/paste operations to the operating system clipboard, and remote access of network resources to name several. These limitations were overcome by creating homemade digital certificates (as opposed to fee-based, privately authenticated security certificates) and digitally signing all Java files associated with the ALLSTAR application. Doing so allows the program full rights to a users system, assuming the user understands and agrees to a

PAGE 59

46 mandatory security warning that appears before the programs first-time execution on a workstation. Instead of burdening ALLSTAR users with a large file download size or tying up their CPU time unnecessarily, a client/server approach was adopted for the ALLSTAR design and subsequent deployment. Javas Remote Method Invocation (RMI) library was used to permit any ALLSTAR client program to connect to and communicate with the centralized ALLSTAR server utility. The server handles incoming requests for data query analysis or data table lookups, and the client concerns itself primarily with GUI elements such as data display and graphing tools. An updated depiction of the ALLSTAR system from Figure 3-1 now appears in Figure 4-7. Compatibility Issues Multiple Java runtime environments exist, one for every major operating system currently in use: Microsoft Windows, Sun Solaris/Unix, Linux, and Apple Macintosh OS are all supported. Cross-platform execution ability was among the features originally influencing the selection of Java as the ALLSTAR application programming platform. Although the application is designed to be accessed by any standard Web browser program, the Java Web Start (JWS) utility was used to reduce the chance that a client users particular browser configuration would interfere with the ALLSTAR programs proper execution or on-screen display. JWS allows an application programmer to forcibly require a particular version of the Java Runtime Environment (JRE)in this case, v 1.4.1 or higherbefore the program will open. JWS also frees the Java application from the browser entirely, allowing a user to close any current browser window sessions and still maintain connectivity with the ALLSTAR application. The combination of these JWS

PAGE 60

47 features help showcase ALLSTAR as a functional standalone application despite its status as a lightweight RMI client program. Object Database ALLSTAR Centralized RMI Server ALLSTAR Java RMI Client (downloaded) Relational Database Domain Ontology Figure 4-7. Communication among the RMI client/server application components

PAGE 61

CHAPTER 5 RESULTS AND DISCUSSION The complexities involved with the integration of ALLSTAR Java programming, UML modeling, and ontology design should not be allowed to overshadow the projects relatively few stated goals: assisting both human and machine readability of and access to the ALS data. The next sections discuss each of these objectives results separately, using ALS staff feedback and general observations to explain the perceived advantages and disadvantages to the approaches taken throughout the ALLSTAR project. ALS researcher comments are the product of regular e-mail feedback, semi-annual on-site interviews, and other personal communications conducted over a two-year period. Additional opinions stem from solicited but anonymous ALS staff responses to an ALLSTAR user survey primarily intended for ALS staff members (Appendix C). Human Researcher Access of ALS Data The first of the two primary ALLSTAR goals is the design and implementation of an environment capable of facilitating ALS researcher access to both their existing relational data in addition to any supplementary knowledge encapsulating the ALS domain in general. It was initially believed that the creation of an object model, if combined with a method (in this case, ALLSTAR) of directly accessing the model could ease a researchers task of gathering and analyzing data records of interest. ALLSTAR does successfully implement its intended functions as described above; however, ALS staff reactions have thus far been mixed, as explained in the next sections. 48

PAGE 62

49 Application Performance The ALLSTAR application, although intended only as a lightweight, Internet-accessible database access/query interface, nonetheless pushes the limits of what is typically expected by a user of a lightweight, Internet-accessible Java program. Presently, it is not yet commonplace to download large Java programs that use windowing systems similar to most graphical operating system environments. Java is most likely familiar to Internet users as a web applet development language, powering small applet programs that are typically embedded within a larger HTML-based web page. It is therefore worth mentioning that although the ALLSTAR user interface resembles a typical window-based application, its client/server architecture relies upon network bandwidth to run smoothly, since most user interactions require communication with the remote database server. ALS researchers are already accustomed to delay times (1 to 5 minutes is not unusual) associated with querying their large-scale relational database. As such, researchers did not appear to be fazed by the frequent but short time delays (typically 5 to 30 seconds) incurred by many in-application commands and data requests. Java is largely platform-independent, as program execution environments (i.e., Java run-time environments) have been developed for most commercially available operating systems currently in use: Microsoft Windows, Sun Solaris/Unix, Linux, and Apple Macintosh OS. Lack of reliance on an operating system comes at a price, however. Javas bytecode structure (a compromise between original source code and completely compiled binary files) requires the use of additional CPU operations to finish the compilation-on-the-fly process. The result is that Java application performance can vary considerably according to a workstations CPU speed and memory specifications. ALLSTAR is no exception; both its command response and screen redraw/refresh issues

PAGE 63

50 were noticeably affected by different CPU/memory configurations using identical operating system versions. Certainly a users first impressions of ALLSTAR can be marred by sluggish menu responses or display glitches, but it is believed that more robust programming techniques may allow the application to depend less on CPU/memory than does its current version. No ALS staff researchers mentioned application response time as being a significant hindrance to using the program. Online Accessibility of Data ALS researchers were mostly in agreement in acknowledging the usefulness of having their experimental data sets (and the newer object model) available online for in-house use. Opinion was split, however, on whether any or all of the raw data observations should be made publicly available. Concerns over the need to validate experimental datachecking for equipment errors, handling data outliersbefore its public release were understandable. Of additional concern was the potential for misappropriation of the ALS data by others: lack of proper citation, disassociation of a primary investigator from a particular experiment, and conclusions drawn using invalid analysis techniques. Counterarguments from the ALS staff nonetheless urged the importance of making as much validated data available online as possible, if only to set an example for other research environments. Several ALS staff researchers agreed that using an object model in conjunction with an ontology knowledge layer (i.e., using ALLSTAR) to query their database records was a preferable method to their existing query tools. Others disagreed, blaming the original relational database model for past suboptimal query performance, suggesting a database remodeling process could benefit little from migrating to an object-oriented platform. Staff sharing this opinion also speculated upon the small size of the ALS

PAGE 64

51 domain concept set (28 concepts are included in the UML object model) as being insufficient to merit the development of a non-prototype version of ALLSTAR. It should be noted that the ALLSTAR prototype development and subsequent evaluation was conducted using only a subset (about 10%) of the complete ALS relational database records, for purposes of server storage reduction and security restrictions on transporting data outside of the KSC complex. In-house installation of ALLSTAR at KSC was performed, however, to demonstrate the portability of the ALLSTAR code across platforms and server installations. In this case, ALLSTAR was shown to successfully query and retrieve results from the entire (100%) set of existing ALS relational database records. This type of remote installation requires little or no source code adjustment, and the ALLSTAR RMI server is compact enough to run alongside other applications on a single dedicated network server. Machine Access of ALS Data If the benefits of ALLSTAR to data access by humans are difficult to measure and clarify, attempting to measure the success of facilitating ALS data access by machines is tougher and more abstract still. The ALLSTAR Java application itself poses little interest to a machine-based client. Instead, the modular layers supporting the ALLSTAR interface might be of some use to future software development efforts bent on knowledge discovery and automated proof learning. Having access to an object-based data layer that is itself partially populated using a large underlying relational database should theoretically simplify a programmers task if a program capable of interfacing with the ALS data is desired. No applications yet exist that are capable of directly interfacing with an unknown UML model and able to query the persistent storage mechanism of an unknown object database layer. Likewise,

PAGE 65

52 competing ontology language standards have hampered the development of applications truly capable of interacting with two or more disparate domain ontologies and drawing logical conclusions between them. Until these technologies mature further, the ALS object database model and ontology layers remain underused resources, able only to enhance the query abilities of human researchers interested specifically in ALS data. Electronic Publication of Experimental Data The field of scientific publishing is currently witnessing a rising interest in making information available in electronic format. Some projects, such as the Open Archives Initiative (2003), the Public Library of Science (2003), and the National Science Digital Library (2003), additionally contend that the type of information available in peer-reviewed or non-refereed scientific journals should be free of charge to all. Indications of progress by a researcher in a particular field, in addition to accomplishment of experimental findings for purposes of sharing knowledge or building career reputation credentials, today remain largely defined by the quality and frequency of a scientists published works. It is unclear when this trend will shift toward predominantly electronic publication, and hazier still is the long-term future of scientific journals at all. In the meantime, it is both unethical and illegal for a U.S.-based scientific journal to attempt to lay copyright claim to experimental data records that form the basis of a published report (17 US Code, 2002). An exception to this policy would be if an author chose to print all of the data observations in the report itself, a possible scenario for small data sets. A typical ALS plant growth experiment generates from 25,000 to 2,000,000 distinct observations; therefore, raw data inclusion in a journal destined for paper distribution remains unlikely. Instead, the ALS data becomes an excellent candidate for

PAGE 66

53 electronic publication in one or more monitored forms. Subsets of ALS data have already been successfully used to create multimedia educational tools suitable for high school students (BioBLAST 2002). NASA recently renewed its pledge to develop and support educational outreach programs, sparking institution-wide interest in generating software tools and NASA-related curricula for use in K-12 and university programs (OKeefe 2002). It is therefore likely that subsets of the existing ALS data will continue to be made available in alternative formats (such as through ALLSTAR) for purposes of specialized interface design and improvement. Future Directions of Research Although the current ALLSTAR application version is not intended for public use, it provides a framework upon which to construct additional information and interface layers. Future research related to the ALLSTAR project would likely include improvements to the interface design, enhancing program code robustness for greater application performance, and expanding both the object data model and the domain ontology to include greater embedded knowledge. The first two research goals would be applicable to a short-term objective of developing in-house or outreach materials that seek to demonstrate principles related to ALS experiments. The latter goal instead pertains to the more distant objective of readying the ALS data set for access by an automated search utilityperhaps a web agent of the proposed Semantic Web. Of great interest to this project is the potential for automated agents to one day integrate ALS expert knowledge with the ALS data setin essence, combining rudimentary components such as the ALS object data model and the domain ontologyto draw logical conclusions about the natural world. These decision support systems will draw upon the large ALS data store to assist in tasks based upon predictive reasoning

PAGE 67

54 algorithms. One module might help detect equipment failure. Another artificial intelligence module might suggest a new ALS experiment to narrow the uncertainty of a particular set of crop growth equations. Still another could help automate the task of data validation and outlier detection. Ultimately, crop growth equations will merge with system models and simulations. With the help of automated reasoning tools such as those hinted at by applications like ALLSTAR, modeling a feasible bioregenerative life support system for a given set of mission considerations will be less daunting a task.

PAGE 68

APPENDIX A ENTITY-RELATIONSHIP ALS DIAGRAM This section contains the entity-relationship (ER) diagram for selected relational entities of the existing legacy ALS relational database. All lines express a one-to-many relationship (many is depicted by the symbol) between two attributes of two distinct entities. Figure A-1. Entity-relationship diagram for existing ALS database 55

PAGE 69

56 Each of the entities in the previous ER diagram are explained further in Table A-1. Table A-1. Entity descriptions from existing ALS relational database schema ALS_CODES Experiment study types, grouped by concept ALS_PARAMETERS Environmental parameters: descriptions, abbreviations, measurement units AUTO_MEASUREMENT Sensor types, locations, parameters measured CHAMBER_HISTORY 5-minute-interval recorded sensor data (e.g., CO 2 ) from growth chambers during experiments CHAMBER_ORGANICS Periodic measurements of organic gaseous compounds (e.g., ethylene) during experiments CHAMBER_SP Parametric setpoints for each experiment, specific to a particular growth chamber CULTIVAR List of cultivars for each agricultural crop type CULTIVAR_LIST List of which cultivars were studied in a particular experiment DAILY_COMMENTS Periodic researcher comments about special actions taken (milestones, anomalies) during experiment EXPERIMENT Primary investigator, crops, start/end dates and descriptions of all ALS experiments EXPT_DETAIL Similar to EXPERIMENT, but includes multiple entries for separate tanks within growth chambers HARVEST Assigns unique ID code to each individual plant harvested HARVEST_COMMENTS Control/treatment description for each plant if relevant HARVEST_DETAILS Fresh and dry biomass measurements for harvested plant material, separated by location on plant LAB_HEADER Assigns unique ID code to hydroponic nutrient solution used in each tank of each experiment LAB_HISTORY Periodic measurements of chemical or nutrient solution parameter from each tank solution TANK_HISTORY 5-minute-interval recorded data (e.g., temperature) from nutrient solution tanks during experiments TANK_SP Parametric setpoints for each experiment, specific to a particular tank within a growth chamber

PAGE 70

APPENDIX B UNIFIED MODELING LANGUAGE ALS DIAGRAM Following is the complete UML diagram for the newly generated ALS object database model, with dependency relationships depicted as arrows between objects. For clarification purposes, enlarged versions of each class also follow. Standard UML class diagram components (as shown in each figure) contain the class name, attribute names and their respective data types. Note that the RootObject class additionally has a defined methodsearchFieldsbut that all other classes inherit this customized method for database query purposes as explained in Chapter 2. 57

PAGE 71

58 Figure B-1. Complete UML diagram for ALS object database model

PAGE 72

59 Figure B-2. Expanded descriptions of individual UML class diagram components

PAGE 73

60 Figure B-2 Continued

PAGE 74

61 Figure B-2 Continued

PAGE 75

62 Figure B-2 Continued

PAGE 76

63 Figure B-2 Continued

PAGE 77

64 Figure B-2 Continued

PAGE 78

65 Figure B-2 Continued

PAGE 79

APPENDIX C ALLSTAR USER EVALUATION Below is the complete (albeit informal) user evaluation survey given to 22 members of the ALS research staff at the Kennedy Space Center. Although the questions attempt to maintain objectivity of scope, they are by no means considered an exhaustive treatment of all issues raised by ALS staff usage of the ALLSTAR database application. Of particular interest to this project were replies containing information related to the typical flow of existing ALS research methods and initial perceptions of the new ALLSTAR-based interface and tools. Responses to the evaluation survey were collected through the ALLSTAR application itself by means of a specialized text-entry panel. The survey requested anonymity of responses, as part of an effort to encourage frank comments. This evaluation was not intended to provide an in-depth picture of ALLSTAR reception and determination of success; as such, no statistical evaluation here applies. Likewise, evaluation responses themselves have been withheld due to the candidness afforded by their anonymous submission. Only brief answers are requested to the seven (7) questions below; however, if time allows, feel free to elaborate on your comments. 1. Do you feel that your pace of ALS-related research is in any way hindered by limitations stemming from the current mode of access to ALS experimental data (UNDACE, Oracle, Microsoft Access, etc.)? 2. If you answered 'yes' to #1, please try to briefly explain why. For example, issues regarding data set size, network server bandwidth, database model/schema, parametric dimensionality of sensor readings, or software tools used to retrieve data records. 66

PAGE 80

67 3. Which software programs other than UNDACE/ALSDMG do you typically use in analyzing the results of an ALS experiment, either for mathematical/statistical analysis or other computations? 4. From your brief introduction to the ALLSTAR application, do you see any merit in the continued storage of ALS-related research in an online-accessible, object-oriented environment such as this one? 5. Please briefly explain your reasoning behind your response to #4. 6. One of the intended goals of ALLSTAR is to provide a sound framework for future software-related development that could one day allow computers to take greater control over scientific research operations; for instance, suggest appropriate experiments or deduce unintuitive (to a human) correlations between seemingly unrelated research environment components. In your opinion, might statistical analysis of this kind be able to assist the immediate goals of ALS program research? 7. Another long-term goal of ALLSTAR is to provide the basis of an Internet-accessible repository of scientific research accessible by both humans and machines for highly efficient knowledge representation and logical deduction purposes. What concerns, if any, do you have regarding the need for journal publication of results versus straight-to-Internet 'publication' of findings?

PAGE 81

LIST OF REFERENCES 17 US Code. Sec. 102. 2002. Beck, H., and H. S. Pinto. 2002. Overview of approach, methodologies, standards, and tools for ontologies. In Proc. Third Agricultural Ontology Workshop. Food and Agricultural Organization of the United Nations. BioBLAST. 2002. BioBLAST: Better Learning Through Adventure, Simulation and Telecommunications. Wheeling, W.V.: Classroom of the Future. Campbell, W. J. 1987. The development of a prototype intelligent user interface subsystem for NASA's scientific database systems, NASA Technical Memorandum 87821. Chaudhri, A. B., and M. E. S. Loomis. 1998. Object Databases in Practice. Upper Saddle River, N.J.: Prentice-Hall, Inc. Chen, P. 1976. The entity-relationship model toward a unified view of data. ACM Transactions on Database Systems 1(1): 9-36. Clark, K. G. 2003. Creative Comments: On the Uses and Abuses of Markup. XML.com. Available at: www.xml.com/pub/a/2003/01/15/creative.html Accessed 21 Feb 2003. Codd, E. F. 1970. A relational model for large shared databanks. Communications of the ACM 13(6): 377-387. DB2. 2003. DB2 Universal Database. Ver. 8.1. White Plains, N.Y.: IBM Corp. DeCoste, D. 2001. Visualizing massive multivariate time-series data. In Information Visualization in Data Mining and Knowledge Discovery. U. Fayyad, G. Grinstein, and A. Wierse, eds. San Francisco: Morgan Kaufmann Publishers. Euzenat, J. 2002. Eight questions about semantic web annotations.IEEE Intelligent Systems 17(2): 55-62. Excel. 2002. Excel 2002. Redmond, Wash.: Microsoft Corp. Ford, R. 1998. Our Moments Have All Been Seized. New York Times, 27 Dec., sec. 4: 9. Gomez-Perez, A., and O. Corcho. 2002. Ontology languages for the semantic web. IEEE Intelligent Systems 17(1): 54-60. 68

PAGE 82

69 Heflin, J., and J. Hendler. 2001. A portrait of the semantic web in action. IEEE Intelligent Systems 16(2):54-59. Hoare, C. A. R. 1962. Quicksort. The Computer Journal 5(1): 10-15. Institute of Food and Agricultural Sciences (IFAS) Information Technologies, University of Florida. 2000. Web Taxonomy. Available at: orb.ifas.ufl.edu. Accessed 24 Apr 2003. Java. 2003. Java Platform 2. Ver. 1.4.1. Santa Clara, Calif.: Sun Microsystems, Inc. JFreeChart. 2003. JFreeChart. Ver. 0.9.8. Hertfordshire, U.K.: The Object Refinery. Kogut, P., S. Cranefield, L. Hart, M. Dutra, K. Baclawski, M. Kokar, and J. Smith. 2002. UML for ontology development. The Knowledge Engineering Review 17(1): 61-64. Lassila, O. 1998. Web metadata: a matter of semantics. IEEE Internet Computing 2(4): 30-37. Lee, G. 1993. Object-Oriented GUI Application Development. Eaglewood Cliffs, N.J.: Prentice Hall. Li, W. 2002. Intelligent information agent with ontology on the semantic web. In Proc. World Congress on Intelligent Control and Automation 2: 1501-1504. Loomis, M. E. S. 1995. Object Databases: The Essentials. Boston, Mass.: Addison-Wesley Longman Publishing Co., Inc. Lopatenko A. 2001. Information retrieval in current research information systems. Position paper, Workshop on Knowledge Markup and Semantic Annotation at K-CAP 2001. MATLAB. 2002. MATLAB. Ver. 6.5. Natick, Mass.: The Mathworks, Inc. National Science Digital Library. 2003. Available at: www.nsdl.org. Accessed 19 Apr 2003. OKeefe, S. 2002. Pioneering the Future. Syracuse University. Syracuse, N.Y. 12 Apr. Object Management Group. 2003. OMG-Unified Modeling Language, v1.5. Needham, Mass.: OMG Inc. ObjectStore PSE Pro. 1999. ObjectStore PSE Pro. Ver. 6. Bedford, Mass.: Progress Software. OIL. 2000. Ontology Inference Language. On-To-Knowledge. Open Archives Initiative. 2003. Available at: www.openarchives.org. Accessed 19 Apr 2003.

PAGE 83

70 Oracle Database. 2001. Oracle9i Database. Redwood Shores, Calif.: Oracle Corp. Public Library of Science. 2003. Available at: www.publiclibraryofscience.org. Accessed 19 Apr 2003. Resource Description Framework. 2001. World Wide Web Consortium. Available at: www.w3.org/ rdf. Accessed 9 Jan 2003. Semantic Web. 2001. World Wide Web Consortium. Available at: www.w3.org/2001/sw. Accessed 9 Jan 2003. Simon, H. 1971. Designing organizations for an information rich world. In Computers, Communications and the Public Interest, 37-72. M. Greenberger, ed. Baltimore: Johns Hopkins Press. SPSS. 2002. SPSS. Ver. 11.5.Chicago, Ill.: SPSS, Inc. SQL Server. 2000. SQL Server 2000 Enterprise Edition. Redmond, Wash.: Microsoft Corp. Strayer, R. F., B. W. Finger, M. P. Alazraki, K. Cook, and J. L. Garland. 2002. Recovery of resources for advanced life support space applications: effect of retention time on biodegradation of two crop residues in a fed-batch, continuous stirred tank reactor. Bioresource Technology 84: 119-127. van Harmelen, F., P. F. Patel-Schneider, and I. Horrocks. 2001. Reference Description of the DAML+OIL Ontology Markup Language. Available at: www.daml.org/2000/12/reference.html Accessed 9 Jan 2003. Wheeler, R. M., C. L. Mackowiak, G. W. Stutte, J. C. Sager, N. C. Yorio, L. M. Ruffe, R. E. Fortson, T.W. Dreschel, W. M. Knott, and K. A. Corey. 1996. NASA's biomass production chamber: a testbed for bioregenerative life support studies. Advances in Space Research 18: 215-224. Wieland, P. O. 1994. Designing for human presence in space: an introduction to environmental control and life support systems. NASA Reference Publication 1324.

PAGE 84

BIOGRAPHICAL SKETCH Christopher Davidson received a Bachelor of Science in Computer Engineering from the University of Florida in 1998. His research interests include artificial intelligence, computational game theory, and applying computerized tools of higher reasoning to classical fields of scienceincluding agriculture, biology and statistics. In his spare time, he enjoys displaying his engineering might by dismantling household gadgetstoasters, televisions, bicycles, you-name-itand scratching his head while he tries unsuccessfully to reconstruct them. 71


Permanent Link: http://ufdc.ufl.edu/UFE0000922/00001

Material Information

Title: Applying Database and Ontology Design Techniques to a NASA Biological Research Repository
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0000922:00001

Permanent Link: http://ufdc.ufl.edu/UFE0000922/00001

Material Information

Title: Applying Database and Ontology Design Techniques to a NASA Biological Research Repository
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0000922:00001


This item has the following downloads:


Full Text












APPLYING DATABASE AND ONTOLOGY DESIGN TECHNIQUES TO A NASA
BIOLOGICAL RESEARCH REPOSITORY















By

CHRISTOPHER DAVIDSON


A THESIS PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF ENGINEERING

UNIVERSITY OF FLORIDA


2003

































Copyright 2003

by

Christopher Davidson















ACKNOWLEDGMENTS

This project was jointly funded by the University of Florida Agricultural and

Biological Engineering Department, the Advanced Life Support program at the Kennedy

Space Center, and the NASA Graduate Student Research Program. Without their

collective support and encouragement, ALLSTAR would not exist.

I wish to thank my advisor, Dr. Howard Beck, for his assistance throughout this

research project; also, Dr. Ray Bucklin and Dr. Doug Dankel for their continuing input.

Peter Chetirkin from Dynamac at the Kennedy Space Center was instrumental in my

comprehension of the ALS data and its potential for scientific and academic use. The rest

of the ALS staff at KSC were always quick with answers to even the most trivial of

questions. My wife, my family, and my friends unfailingly surround me with support and

love. Without them, endeavors like these would be impossible and unimportant.















FOREWORD

What information consumes is rather obvious: it consumes the attention of its
recipients. Hence, a wealth of information creates a poverty of attention and a
need to allocate that attention efficiently among the overabundance of
information sources that might consume it (Simon 1971).

Intelligence just means information now ... people who know a lot about
technology would like to console us with their faith that it's neutral, that tools
won't change human nature. But how do they know (Ford 1998)?

Nobel laureate Herbert Simon and novelist Richard Ford express similar concerns,

despite the decades separating their comments. Both ponder the deleterious effects of

prodigious sources of information influencing humankind's self-perception and pace of

life.

Scientific research by definition necessitates a tight focus of study and a hesitance

to consider immeasurable, unobservable, or statistically unwieldy properties of an

experiment. But when the experiment is the collection, organization and dissemination of

a large knowledge base, the effects on its field of interest understandably cannot be

gauged for a long time to come. Foresight and introspection urge scientific researchers to

carefully consider more than just their experimental conclusions. Without adequate

means to review and analyze experimental data, in time the information becomes

misplaced or forgotten. Luddites and technophobes are right to question the effects on

society of unbounded, limitless stores of information. The unthinkably large database of

Simon's dreams and Ford's nightmares will soon exist. What remains to be seen is how

humankind will allow these tools to shape their existence and beliefs.
















TABLE OF CONTENTS
page

A C K N O W L E D G M E N T S ......... ...................................................................................... iii

F O R E W O R D ........................................................................................................ iv

LIST OF TABLES ....................................................... ............ .............. .. vii

LIST OF FIGURES .............. ................................. ............. ........... viii

L IST O F A C R O N Y M S .................................................................... .......................... x

ABSTRACT ........ ........................... .. ...... .......... .......... xii

CHAPTER

1 IN TR OD U CTION ............................................... .. ......................... ..

Project B background ....................................................... ................ .2
D ata A acquisition and Storage ......................................................... .............. 4
Lim stations of D ata Retrieval and Analysis............................................................... 4
Object-Oriented Database Design ............... .......... ..... ........... ..... .......... 6
D om ain O ntology D evelopm ent............................................ ........................... 8
Project Goals of ALLSTAR ......................... .............................. 10

2 OBJECT DATABASE DESIGN FOR ALLSTAR............................... ...............12

Legacy Relational Schema of ALS.......................... ........... ........... ......... 12
Existing M ethods of Database Access................... .............. .................... 15
Object-Oriented Database Design .................. ..................... .... ........... 16
Object M odel Development......................................... .................... 17
O bject M odeling w ith U M L .......................................................................... ... ... 18
Class Relationships in UM L ............... ....................... ... .... ............... 19
Superclass and Subclass .................................................... ... ................. 20
A rray dim ensionality .............................. .......................... ................21
Contain ent. ................................. .. .... ........ ...............21
Jav a C lass G en eration ........................................................................................... 2 1
Implementing Object Persistence ................. .............. ....................22






v












3 ONTOLOGY DEVELOPMENT FOR ALS ................................... .................24

W hat is an O ntology? ............................................................... .. ..... 24
O ntology Language Selection .............................................. .. ...................... 25
Unified M odeling Language (UM L).............. ......... .................................. 26
Resource Description Framework (RDF).................................... ..................27
The DARPA Agent Markup Language (DAML)..............................................28
Ontology Inference Layer (OIL) ................................................. 29
A advanced Ontology Operations ........................................ ........................... 29
Im plem entation of AL S O ntology ..... ... .................. ..................................... ............... 31

4 ALLSTAR APPLICATION SOFTWARE DEVELOPMENT................................35

A application D esign C considerations ................................................. .....................35
Q uery Processing ................................................ ..... ... .. ............ 36
D database N avigation......... ................................................ .......... ........ 41
On-Screen Visualization of D ata...................................................................... 43
Graphical User Interface Considerations ......._... ............................ 45
Application Deployment Issues.................................................... ............... 45
Security Issues .............. ...... .. .................... ... ....... 45
C om patibility Issues ........................ .. ...................... ........ .. ...... ............46


5 RESULTS AND DISCU SSION ........................................... .......................... 48

Human Researcher Access of ALS Data ..................................... ...............48
A application Perform ance......................................................... ............. 49
O nline A accessibility of D ata .................................................... ...... ......... 50
M machine A access of ALS D ata ......................................................... ............... 51
Electronic Publication of Experimental Data .....................................................52
Future D directions of Research ........................................................ ............. 53

APPENDIX

A ENTITY-RELATIONSHIP ALS DIAGRAM ........................................................55

B UNIFIED MODELING LANGUAGE ALS DIAGRAM ..........................................57

C ALLSTAR U SER EVALUATION ............................................... .....................66

LIST OF REFEREN CES ..................................................................... ............... 68

B IO G R A PH IC A L SK E T C H ..................................................................... ..................71
















LIST OF TABLES


Table pge

2-1. Partial excerpt from EXPERIMENT relation of ALS relational database ..............13

2-2. Entity descriptions of ALS relational database schema ........................................14

2-3. Typical SQL-92 standard data types supported by RDBMS applications ................16

4-1. Database queries in the ALS dom ain ............................................. ............... 39

A-1. Entity descriptions from existing ALS relational database schema .......................56
















LIST OF FIGURES


Figure p

1-1 Simplified material integration model of bioregenerative life support system..........3

1-2 Sam ple database query ................................... .................................... ..................... .... .

2-1 Sample SQL relational database query including a join operation..........................13

2-2 Sample UML class diagram for two ALS object database classes ..........................19

2-3 Object relationships in the ALS object database UML class diagram.................20

2-4 Generating Java source code from UML class diagram ...........................................22

3-1 Communication among three primary ALLSTAR components .............................26

3-2 Subsumption relationships among ontology language standards..........................27

3-3 Advanced and future roles for developed ALS ontology..................................30

3-4 Web Taxonomy as an ontology visualization and editing tool .............................33

3-5 Multiple levels of taxonomic classification in the ALS ontology............................34

4-1 ALS domain database query with object-based navigation .............................. 38

4-2 Sample synonym lookup in ALS domain ontology ...........................................41

4-3 Conventional tabular style of object database navigation .................................42

4-4 Graphical object database navigation................................... ........................ 42

4-5 Tabular display of object instance in database ....... ... ....................................... 43

4-6 Graphical display of growth chamber temperature data ............... .............. ....44

4-7 Communication among the RMI client/server application components ................47

A-i Entity-relationship diagram for existing ALS database.......................................55

B-l Complete UML diagram for ALS object database model................. ....... ........ 58









B-2 Expanded descriptions of individual UML class diagram components...................59


















ALLSTAR

ALS

ALSDMG

ANN

BLSS

BPC

CEC

CELSS

DAML

DARPA

DBMS

DTD

GIS

GUI

HTML

JRE

JWS

NASA

ODBC

ODMG


LIST OF ACRONYMS

Advanced life support/space and terrestrial research repository

Advanced life support

ALS data management system

Artificial neural network

Bioregenerative life support system

Biomass production chamber

Controlled environment chamber

Controlled ecological life support system

DARPA agent markup language

Defense advanced research projects agency

Database management system

Data type definition

Geographical information system

Graphical user interface

Hypertext markup language

Java runtime environment

Java web start

National aeronautics and space administration

Object database connectivity

Object data management group









OWL Web ontology language

OMG Object management group

OQL Object query language

OS Operating system

RDBMS Relational database management system

RDF(S) Resource description framework (schema)

RMI Remote method invocation

SGML Standardized general markup language

SQL Structured query language

UML Unified modeling language

UNDACE Universal data acquisition and control engine

W3C World wide web consortium

XML(S) Extensible markup language (schema)















Abstract of Thesis Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Master of Engineering

APPLYING DATABASE AND ONTOLOGY DESIGN TECHNIQUES TO A NASA
BIOLOGICAL RESEARCH REPOSITORY

By

Christopher Davidson

August 2003

Chair: Howard Beck
Major Department: Agricultural and Biological Engineering

This thesis discusses the design and implementation of ALLSTAR (Advanced Life

Support/Space and Terrestrial Research Repository), an Internet-accessible, object-

oriented database application capable of facilitating access to biological experiment data

subsets by both human researchers and automated search agents.

For decades, researchers from the National Aeronautics and Space Administration

(NASA) have conducted experiments exploring the feasibility of a bioregenerative life

support system. Such a system could potentially extend the duration of manned

spaceflight activities-including future trips to the Moon, to the planet Mars, or even

farther from Earth. Scientists at the Kennedy Space Center (KSC) in Florida involved

with the Advanced Life Support (ALS) program have conducted hundreds of biological

experiments; most studied plant development in closed-environment growth chambers.

Over fifty million data observations have been recorded; unfortunately, the underlying









relational database structure and the copious amount of data records produced have, over

time, led to difficulties in accessing and evaluating the data set.

The ALLSTAR application was developed with the intention of facilitating both

human and machine readability of and access to the ALS data. Doing so required

research components related to ALS domain knowledge analysis, construction of both an

object-oriented database model and an ALS-specific domain ontology, and the

development of a foundation for future development of ALS-related decision support

system and predictive reasoning components that make use of the ALLSTAR data access

model. Both the object database model and the domain ontology were successfully

merged into a single application intended for ALS researcher use. An application

evaluation was conducted and user opinions were collected. Suggestions for future

research and design work related to the ALLSTAR project are also discussed.














CHAPTER 1
INTRODUCTION

The core concept driving the development of the ALLSTAR (Advanced Life

Support/Space and Terrestrial Research Repository) database application and research

repository is the desire by scientific researchers to consolidate vast collections of data and

knowledge into a single location. Common sense suggests that a more intuitively

navigated database or user interface should enhance the accessibility of the underlying

data records. Putting these ideas into practice, however, entails a need for user acceptance

by a research staff or the general public. And if Simon's ideas (Simon 1971) are literally

interpreted, no specific model of database organization can remedy the fact that a user

interface is a crutch necessitated by the inability to model and present knowledge in a

universally understandable fashion. But the cautionary opinions of Simon and Ford (Ford

1998) probably did not take into consideration what is now thought to be the next

fundamental advance in information technology: data and information analysis by

machines, not people, to form highly organized knowledge bases from which logical

conclusions can be drawn automatically about the natural world. Although computers are

to blame for today's attention economy and many people's high-velocity lives, it is

computers that have the potential to ultimately relieve civilization of this burden of

information overload. The computer science field of artificial intelligence is counting on

database theorists and designers to help generate the vast, domain-specific knowledge

reservoirs capable of fueling the first wave of truly intelligent software applications.









Project Background

The Advanced Life Support (ALS) program, a research team working with the

National Aeronautics and Space Administration (NASA), has spent decades studying the

possibility of growing plants in controlled environments to support human life as part of a

bioregenerative life support system (BLSS). Earth, orbiting spacecraft, and even other

planetary surfaces each pose unique growth environments and design considerations. The

ALS research is poised to become one of the first of these aforementioned machine-

understandable knowledge domains. Foresight on the part of ALS researchers at the

Kennedy Space Center (KSC) in Florida resulted in archives of their biological

experiment data dating back to 1986, comprising more than fifty million individual data

observations that span more than 300 closed-chamber experiments.

Current technology readiness levels do not feasibly allow the possibility of sending

humans far into space, including such missions as NASA's well-publicized but indefinite

goal to place a human crew on the planet Mars. To do so would require either significant

advances in spacecraft rocket propulsion technology or advances in bioregenerative life

support system development. Without either of these technologies, equivalent system

mass (ESM) limitations are too prohibitive to consider sending one or more human crew

members much farther than the Moon's orbit around the Earth. The mass needed to

provide adequate sustenance, water, fuel and shielding for internal crew quarters far

exceeds plausible limits for spacecraft design (Wieland 1994). A BLSS would allow

astronauts to recycle mass, such as exhaled CO2 and metabolic waste, with the assistance

of plants as both a renewable food source and an atmospheric revitalization component.

A successful BLSS would mimic many life-cycle processes found on Earth (Figure 1-1):

CO2/02 gas exchange between plants and humans, transpiration as a viable method of









water purification and reclamation, and human metabolic waste processing and renewal

with the assistance of selected plants (Wieland 1994).





food
watr Human




/ ray water





nutrients and C02

io as' inedible material and 02 Mi
Biomass Microbial
Production Bioreactor
Figure 1-1. Simplified material integration model ofbioregenerative life support system


For years the ALS staff has been conducting controlled-environment plant growth

experiments. The results obtained are typically used in the analysis of space and

gravitational biology concepts, including the development and validation of crop growth

models with respect to various parametric inputs: 02/CO2 partial pressures, air

temperature and relative humidity, ambient light levels, and soil/solution pH and

electroconductivity, to name a few. Many such models seek to maximize the harvest

index of the total biomass but minimize the growth area requirements to make the most

effective use of a limited-space crop growth scenario for food source production.

Successfully tested crops matching this qualification include wheat, lettuce, soybean and

potatoes (Wheeler et al. 1996). Other candidate ALS crops are being considered for









possible BLSS roles as metabolic waste processors for wastewater or gray water (human

hygiene water) produced by a potential crew.

Data Acquisition and Storage

The current control system used by ALS researchers at KSC, the Universal Data

Acquisition and Control Engine (UNDACE), provides control and monitoring

capabilities for fifteen permanent controlled environment chambers (CECs), smaller

growth chambers, and bench-top resource recovery experiments. Similar control and

monitoring are also available for the Biomass Production Chamber (BPC), a cylindrical

steel chamber (20 m2 area, 113 m3 volume) formerly used for hypobaric tests during

NASA's 1958-1963 Mercury Project (Wheeler et al. 1996). UNDACE hardware for

monitoring and control includes a Sun SPARCstation, Opto-22 digital and analog

input/output boards, and all equipment required to route monitored sensor data to a

central Oracle relational database. UNDACE was developed at KSC and remains the

primary interface between operator and experiment (Strayer et al. 2002). Of particular

interest to the ALLSTAR project are the manner and frequency with which raw data

observations are recorded. The vast majority of experimental observations are stored

automatically to relations in the Oracle database. The underlying relational database

model itself is discussed further in Chapter 2.

Limitations of Data Retrieval and Analysis

The existing UNDACE system is adequate for monitoring ALS experiments in

progress. Researcher-defined setpoint levels-values from which the real parameter

measurement should not be allowed to statistically deviate-for parameters such as air

temperature or CO2 concentration can be checked against real-time sensor data in a

timely fashion to allow environment adjustments and to ensure the integrity of an









experiment. For purposes of post-experiment data analysis, however, the usefulness of

the underlying relational database model (Appendix A) and its associated software-based

interface tools (UNDACE) measurably falters. Data analysis typically involves a

researcher importing subsets of ALS experiment data from Oracle (via an Oracle client or

a Microsoft Access form/report) into another application-Excel (2002), SPSS (2002),

MATLAB (2002) or other programs-to carry out statistical analysis of the data (ALS

Survey 2003, Appendix C). As a result of using a relational database model, queries-

questions to be answered using the available data in the database-are restricted to

combinations of Structured Query Language (SQL) statements, which are not

immediately intuitive in nature. Even relevant graphical user interface (GUI) tools must

rely on SQL commands to carry out a database query.

Although heralded in the mid-1970s as a significant achievement in database

structure modeling, Codd's proposed relational model (Codd 1970) carries some

limitations as seen from the perspective of a scientific researcher trying to drill down into

a large data set to retrieve a specific desired subset for analysis purposes. Reliance on

primary and foreign keys to link relation entities introduces great potential for

redundancy in an ill-designed database schema or in a database that has outgrown its own

model structure. The existing ALS relational database falls into this latter category and

has thus fallen victim to misuse and overextension of its intended capabilities, as

explored in Chapter 2. Also, relational databases store only primitive data types (e.g.,

string/text, integer, floating-point number, date/time) with few exceptions. Lastly, the

relational database model is not capable of attempting any type of semantic interpretation

of its contents; nor can it easily express its contents as intuitively organized knowledge.









Object-Oriented Database Design

Despite the present-day widespread deployment of relational database

management systems (RDBMSs), including Oracle Database (2003), SQL Server (2000)

and DB2 Universal Database (2003) as a few large-scale commercially available

examples, the more recent field of object-oriented database design continues to grow in

popularity. Having itself arisen from frame-based knowledge representation (KR)

approaches of decades past, object-oriented design drew serious attention as a KR

paradigm in the 1990s and helped spawn several successful computer programming

languages, most prominently Smalltalk, C++ and Java. It is believed that the evolution of

the existing ALS relational data model to an object-oriented model (hereafter referred to

as an object model) may increase the intuitiveness, utility and efficiency of any human-

or machine-accessible interface attempting to browse, query or display specific data

subsets.

The first step in designing the ALLSTAR application was to analyze the

underlying ALS relational database schema in addition to all ALS-related material being

considered for inclusion in the new database. This material included ALS experiment

protocols describing each potential experiment in detail before it is carried out,

engineering specifications of sensors and other equipment used to monitor experimental

data, and relevant bibliographic citations from ALS journal publications or technical

memorandums in the public domain. Doing so allowed the construction of an object-

oriented database schema using Unified Modeling Language (UML) design tools. The

schema definition includes objects, their attributes, and their interrelationships. Owing to

the formidable size of the existing ALS relational database (50,000,000+ records), the

raw sensor data from experimental observations were kept in the relational entities, since










relational databases are already well-suited for large collections of sequential time series

measurements. The ALLSTAR application can therefore be classified as an object-

oriented data layer atop a massive trove of relational data records. While the existing

UNDACE tools permit strictly relational queries, ALLSTAR uses its object model as a

primary interface to the contents of both the object and relational databases. Note that this

technique is not equivalent to an object-relational database, which uses object-like data

structures to partially extend the functionality of an existing relational data layer.

One goal of the new object database model is to permit queries of a more intuitive

nature. Instead of needing to know prior information about the relational database

structure-which data is in which relational entity, meanings of often cryptic attribute

field descriptors, or familiarity with SQL commands-a researcher should be able to pose

a query involving object attributes. To briefly show the differences between the two

query approaches, consider the next example: a researcher desires a list of all ALS

experiments conducted by primary investigator Gary Stutte that involved wheat as a

candidate crop and seed positioning as a primary study type (Figure 1-2).

A select ID EXPERIMENT from EXPERIMENT, ALS CODES
where TX PRIMARY INVST like '%Stutte%'
and TX CROP PRF like '%Wheat%'
and ALS CODES.TX DESCRIPTION like '%seed positioning%'
and EXPERIMENT.CD GCS = ALS CODES.ID CODE

B select Experiment ID Code from Experiments
where PrimaryInvestigator.lastName like 'Stutte'
and CropsInvolved.name like 'Wheat'
and StudyType.description like 'seed positioning'

Figure 1-2. Sample database query. A) Relational database SQL commands. B) Object
Query Language pseudocode.

The object-oriented approach takes advantage of the higher level of intuitiveness

of its database schema to reduce or eliminate unnecessary join operations, and it may









grant the researcher the ability to rapidly formulate more complex query constructions

than were previously possible with relational entities and SQL alone. These query

differences are examined in greater detail in the next chapter.

Domain Ontology Development

The final step in readying the ALLSTAR application for both human- and

machine-readable content insertion is the development of an additional semantic

information layer. This domain-level ontology-an explicitly defined, machine-

understandable hierarchical specification of a shared conceptualization-is capable of

interfacing with the object database layer and assisting with the semantic analysis of user

queries (Gomez-Perez et al. 2002). Often implemented as a hierarchy of domain-specific

vocabulary and interobject relationships, an ALS domain ontology attempts to capture as

much background knowledge as possible about the ALS program, its experiments, its

research direction and its inner workings. This ontology can also be thought of as a top-

level layer of information that describes objects and behaviors common to the ALS

program. More concisely stated, an ontology richly describes entities and concepts in a

system.

The presence of the ontology in the ALLSTAR application serves two primary

purposes: database query assistance by means of a simplistic predictive reasoning

algorithm that attempts to automatically classify incoming ALS search requests, and an

interface for future contact by automated information search agents-sometimes referred

to as Web agents by proponents of the Semantic Web Activity (2001) overseen by the

World Wide Web Consortium. Even a well-modeled object database is not best suited to

store richly descriptive semantic information about its object contents; instead,

descriptive tagging languages-many of which have stemmed from the Extensible









Markup Language (XML)-are most frequently used in ontology construction and

subsequent integration with a search/processing software application. Prominent

examples of currently used ontology languages include the Resource Description

Framework (RDF), Defense Advanced Research Projects Agency (DARPA) Agent

Markup Language (DAML), and Ontology Inference Language/Layer (OIL). Often,

similarities among languages due to their common origins allow their concurrent usage

for ontology design: XML+RDF and DAML+OIL exemplify this trend.

One roadblock preventing a more accelerated pace of development of machine-

readable domain knowledge content appears to be the proliferation of these description

languages and researchers' subsequent hesitation in adopting any particular one for a

project design. Each group's proponents tout the benefits of their respective language and

encourage its refinement, but conflicting goals among languages-some attempt to

capture a much finer level of detail about a domain than others, for example-have thus

far prevented any single predominant standard from emerging. In turn, software

application developers have shied away from standardizing their domain content on only

one descriptive language, often preferring to make their knowledge content available in

multiple description language formats. To exacerbate this problem, no central registry yet

exists for the deposition of completed ontologies-typically described using specialized

schemas and made available as plaintext files-on the Internet or otherwise, for any of

the above mentioned languages.

A richly described ontology capable of storing actual object instances can work

double duty as both a knowledge layer and an object database layer (Beck et al. 2002).

However, no mature applications yet exist that combine this functionality with robust









object storage performance and unfettered access to the underlying application

programming interface. Although it is likely that future object-oriented database

platforms will be based less on particular programming languages (e.g., C++, Java) and

more on ontology languages, for now an ontology layer separate from the object storage

layer is preferable.

Project Goals of ALLSTAR

The ALLSTAR application encompasses several separate objectives, each a

component of the more far-reaching goal of accelerating the pace of BLSS research and

development. Although ALLSTAR is intended as a single computer software application,

its components are modular in design and can be improved upon individually. Each

objective of the ALLSTAR application project can be described in either of two primary

goal categories: assisting human readability of and access to the ALS data, and assisting

machine readability of and access to the data.

From a human perspective, ALLSTAR seeks to increase the intuitiveness levels

of both the database schema-which objects are relevant to ALS research and how they

interact-and database queries. Faster access to data subsets could allow an accelerated

pace of BLSS equation development and feasibility testing by ALS researchers.

Implementation of ALLSTAR on a platform-independent, Internet-accessible scale

maximizes its usability in fields of Advanced Life Support, related academic research,

K-12 education tools and general public use. Existing crop growth models and

simulations can be validated using the ALS experimental plant biology data. And ALS

researchers may be among the first to embrace direct online publication of scientific

research and data, making their experimental data sets publicly available and in time

foregoing the need to publish results solely in peer-reviewed or non-refereed scientific









journals. Implications of ALS data availability's effects on experiment result publishing

and discussion are given a closer examination in Chapter 5.

Although few data mining applications yet exist that attempt to take advantage of

multiple ontologies spanning interdisciplinary knowledge bases for logical reasoning or

automated cross-classification of new terms, ALLSTAR also readies the ALS knowledge

base for machine-implementable search and analysis algorithms. By expanding the

original ALS relational data model to include contemporary methods of data modeling

and presentation, development of additional tools for purposes of data visualization,

statistical analysis and ultimately predictive reasoning can be facilitated.

To sum up, development of ALLSTAR includes the design and implementation of

the following: an ALS object model, an ALS domain ontology, and a prototype-level

Internet-accessible computer program capable of integrating these components along

with the existing ALS relational data.














CHAPTER 2
OBJECT DATABASE DESIGN FOR ALLSTAR

Relational database models date back to the 1970s and are well suited for storing

large collections of sequential data, such as sensor readings of environmental parameters

recorded every five minutes during a typical ALS crop growth experiment. What a

relational model is not particularly adept at is representing complex interactions among

its constituent data items, as these relationships are relegated to a flattened two-

dimensional array of attribute-value pairs. A more intuitive approach to data modeling is

the object-oriented knowledge representation paradigm. Although object-oriented

languages date back to the late 1960s and early 1970s, they did not enjoy widespread

acceptance and usage until the 1990s (Chaudhri et al. 1998). The first step in developing

the ALLSTAR application involves the deconstruction of the existing ALS relational

database and the subsequent production of a viable object model to act in its place as an

interface between a human or machine client and the underlying data observations.

Legacy Relational Schema of ALS

The fundamental building block of relational data models is the relation (entity)

element. Each named relation can possess a number of attributes and each data record

(tuple) contains a single value per attribute. An example of an ALS database relation is

excerpted in Table 2-1.

The ALS relational database contains 18 relations useful for ALLSTAR inclusion,

with content such as experiment descriptions, sensor readings taken every five minutes,

and harvest biomass data. Each relational entity is related to at least one other by means









Table 2-1. Partial excerpt from EXPERI1MENT relation of ALS relational database
ID EXPERIMENT TX CROP PRF D START DATE D END DATE
WP021 Potato 5/22/2002 7/3/2002
SB021 Soybean 6/19/2002 8/21/2002
LT023 Lettuce 6/24/2002 7/19/2002


of a primary/foreign key relationship; that is, a uniquely distinct field in one relation

(such as an Experiment ID Code as shown in Table 2-1) serves as a matching identifier

for a non-unique field in another relation. This relationship allows Structured Query

Language (SQL) commands to perform join operations, permitting a query to logically

connect data from separate relational entities. As a brief example, consider this sample

relational query: a researcher wishes to extract a list of all cultivars (crop varieties)

associated with ALS experiments conducted in or after the year 2001. Prior knowledge of

which crops were studied (as opposed to cultivars, the desired results) in these

experiments is not needed, as shown in Figure 2-1.

select TX CULTIVAR from CULTIVAR, EXPERIMENT
where EXPERIMENT.D START DATE >= 01/01/2001
and CULTIVAR.TX CROP = EXPERIMENT.TX CROP PRF

Figure 2-1. Sample SQL relational database query including ajoin operation


A more complete entity-relationship diagram (Chen 1976) for selected relations of

the ALS database appears in Appendix A. A brief familiarization, however, with each of

the relational entities is helpful and appears in Table 2-2.

Owing to the limitations of modeling information using a relational schema, the

ALS database contains significant pockets of redundancies. 25 of 283 records of the

CULTIVARLIST relation described above, for instance, differ from other records by

only one of three attribute values, an 8.8% redundancy rate stemming from the need to









Table 2-2. Entity descriptions of ALS relational database schema


ALS CODES
ALS PARAMETERS

AUTO MEASUREMENT
CHAMBER HISTORY

CHAMBER ORGANIC

CHAMBER SP

CULTIVAR
CULTIVAR LIST

DAILY COMMENTS

EXPERIMENT

EXPT DETAIL

HARVEST

HARVEST COMMENTS

HARVEST DETAILS

LAB HEADER

LAB HISTORY

TANK HISTORY

TANK SP


Experiment study types, grouped by concept
Environmental parameters: descriptions,
abbreviations, measurement units
Sensor types, locations, parameters measured
5-minute-interval recorded sensor data (e.g., CO2)
from growth chambers during experiments
Periodic measurements of organic gaseous
compounds (e.g., ethylene) during experiments
Parametric setpoints for each experiment, specific
to a particular growth chamber
List of cultivars for each agricultural crop type
List of which cultivars were studied in a particular
experiment
Periodic researcher comments about special actions
taken (milestones, anomalies) during experiment
Primary investigator, crops, start/end dates and
descriptions of all ALS experiments
Similar to EXPERIMENT, but includes multiple
entries for separate tanks within growth chambers
Assigns unique ID code to each individual plant
harvested
Control/treatment description for each plant if
relevant
Fresh and dry biomass measurements for harvested
plant material, separated by location on plant
Assigns unique ID code to hydroponic nutrient
solution used in each tank of each experiment
Periodic measurements of chemical or nutrient
solution parameter from each tank solution
5-minute-interval recorded data (e.g., temperature)
from nutrient solution tanks during experiments
Parametric setpoints for each experiment, specific
to a particular tank within a growth chamber


separate the ALS experiment records by growth chamber over in the EXPERIMENT

relation. More grave a concern is the relation's use of a TXCULTIVAR attribute that

has over time been allowed to contain more than one cultivar name (e.g.,

"McCall/Pixie"), resulting in a loss of contextual meaning for this attribute and adversely

affecting database searches. 62 such multiple-cultivar records exist, or 21.9% of the









relation. Scenarios like these are identifiable throughout the ALS database, but are not

unexpected. With regard to database schema evolution over time, it is often more

convenient for database administrators to make minor adjustments to a slightly flawed

design than perform database-wide adjustments which prompt all users and relevant

software applications to forcibly conform to any significant new changes.

Existing Methods of Database Access

The existing ALS experimental data is stored in an Oracle relational database and is

most frequently accessed by either an Oracle client program or a Microsoft Access

form/query via Microsoft's Open Database Connectivity (ODBC) driver. Database

programmers with the Dynamac Corporation-a subcontractor of NASA involved with

ALS research-created an interconnected system of 140 queries and 97 forms (written in

Visual Basic) for Microsoft Access, collectively referred to here as the ALS Data

Management System (ALSDMG). The system permits ALS researchers to query the

entire relational dataset by any of a variety of field types using search keywords or

numerical range statements. Although the ALSDMG partially precludes the need for an

ALS researcher to know the inner workings of the database-its schema structure, field

names and coded abbreviations-it replaces the low-level database access with a maze of

user input interfaces often dissimilar in appearance, haphazardly arranged with respect to

their underlying data records and not readily customizable. Once an ALS researcher

becomes accustomed to the ALSDMG interface, it can be more efficiently traversed and

manipulated. The learning curve for novice users, however, may be substantial-in part

due not to lack of programming finesse but to the modeling limitations imposed by the

relational database structure and design (ALS Survey 2003).









Object-Oriented Database Design

Two advantages of object design are the transparency of the underlying database

storage/access mechanisms, and the ability to model a system with higher intuitiveness

levels inherent in its class structure. Many programmers want the ability to permanently

store objects created with object-oriented languages-a process known as object

persistence-but do not want to deal with a database management system (DBMS) that is

separate from the programming environment (Loomis 1995). In addition, most relational

models maintain the severe handicap that only data belonging to a fixed set of data types

(Table 2-3) can be faithfully represented. An exception is a database using the newest

SQL3 standard, which provides for limited object-type extensions via object-relational

definitions. Object-relational standards such as SQL3 are offered as ways to improve

database schemata without having to migrate to a fully object-oriented platform, but the

programming aspects of the ALLSTAR project benefit greatly from the decision to use a

Table 2-3. Typical SQL-92 standard data types supported by RDBMS applications
Data type Example
CHARACTER/VARCHAR (text) 'carrot'
SMALLINT (short) 32,766
INTEGER (long) 1,300,455
REAL (single) 525,000,000
DOUBLE PRECISION (double) 615,100,000,000,000
FLOAT 46.1726435
DATE 03/07/2005
TIME 19:25:00
TIMESTAMP 9/12/1990 11:55:00


completely object-oriented schema, as discussed in Chapter 4. Precision and scale values

of numerical data types are set by the DBMS and often vary across platforms.









Objects stored in an object database can be of any data type, including those

substantiated by user-customized object class type definitions. This flexibility in

modeling is what allows a database designer to construct a more naturally understandable

schema to represent real-world knowledge. Instead of a flat-file, tabular arrangement of

rows and columns in a relational entity, data records become values of attributes assigned

to individual object instances persistently stored in memory.

Object Model Development

Staff interviews, background literature, and past ALS journal publications were

analyzed for potential starting points for the new object model design. The existing ALS

relational database was also scrutinized, revealing additional trends in data collection and

usage that steered the model toward a more complete representation of ALS knowledge.

Several database entities lent themselves to immediate consideration as new object types:

experiments, researchers, growth chambers, nutrient solution tanks and plants. These new

objects formed the foundation of what would later evolve into the completed object

model. As an example, one published study discussed the effects of different lighting

sources on potato growth, which in turn led to the consideration of Light Source as a new

candidate database object.

Although semi-automated text parsers used in conjunction with electronic versions

of publications may assist in identifying key phrases of interest, the modeling process

remains largely manual throughout most of its design phases. Incremental evolution of

the object model, along with ongoing analysis of the logic connecting objects to each

other, cannot yet be conducted in an automated fashion. Without the ability to draw upon

existing expert knowledge-in this case provided by ALS researchers already familiar









with their own experimental data-it would be a substantially more difficult task to

accurately model domain information in a useful manner.

Object Modeling with UML

After identifying several ALS-domain objects (e.g., experiments, researchers,

growth chambers) to serve as starting points for the database schema, the task of selecting

an appropriate modeling language (not to be confused with the application programming

language, discussed in Chapter 4) needs to be settled. Standardized usage in the database

modeling industry influenced the eventual choice of the Unified Modeling Language

(OMG 2003). UML is the result of collaborators from the Object Management Group, a

non-profit, open consortium of companies who seek methods of standardizing object-

based software for purposes of hastening its public acceptance and use (OMG 2003).

Many open-source and commercial UML integrated development environments abound.

The choice of ObjectStore PSE Pro (ObjectStore 1999) again reflects previous

Agricultural and Biological Engineering Department experience with this suite of

software tools.

Construction of a UML class diagram-with the intention of capturing primary

ALS-related objects and their interrelationships-involves assigning string-based names

to each object class and defining attributes of and connections between each. Unabridged

class diagrams include the object class name, names and data value types of each

attribute, and names and (expected return) data types of each relationship method (OMG

2003). Appendix B includes a complete UML diagram for the ALS knowledge domain;

for now, an excerpt appears in Figure 2-2.

Figure 2-3 is intended only to demonstrate the potentially complex object

interrelationships, shown by paths connecting each related object in the class diagram.





















Figure 2-2. Sample UML class diagram for two ALS object database classes


Diagram details are available in Appendix B. The complexity of the figure also serves as

a reminder that a UML-based object model can be equally as difficult to portray all at

once as its relational predecessor.

A closer inspection of the diagram may reveal UML's modeling advantages with

respect to ease of schema visualization. Each relationship (line) connecting two classes in

Figure 2-3 represents the attribute value of one class referencing another user-defined

(non-primitive) class. For instance, consider trying to model the following concept: a

cultivar is associated with one (and only one) type of crop. The Cultivar class from

Figure 2-2 has an attribute name of data type String, which is a primitive-level collection

of text characters. But the ofCropType attribute is of type Crop; in other words, the value

of ofCropType is not a single numerical or string value. The value is instead an instance

of another class (Crop) found elsewhere in the UML diagram. This UML schema

therefore attempts to model this sentence as plainly as possible: "A Cultivar is an object

or concept whose ofCropType attribute is of object type Crop."

Class Relationships in UML

UML class diagrams allow clustering of like concepts by means of several basic

types of relationships, many of which are visible as linear paths in Figure 2-3.


Plant
+ experimentUsedin: Experiment
+ trayLocatedin: Tray
+ researcherPlantComments: Document[]
+ cultivarType: Cultivar
+ plantlDNumber: Integer
+ harvestMeasurements: Observation[]


Cultivar
+ name: String
+ description: String
+ ofCropType: Crop
+ researcherComments: Document[]














RontObjeca __ ------ALSmage Book
+images ALSImage[] ALLmageBinary +title String + bookip terms BookChap r[]
+ boolean searchFleld(SIng args) + ploadedlmageFleName String + description String
e+ deteTken Date
2 b+ I h image ALSImageBna Chr


SDat -- Docum-nt rCo ---- oceed-ngnt-
ir. ,r "r r l ,Stetr + organlzatlonName Stng
String dateCreat + proceedingFullName Stnng
S+ / deteCree .1, ,.. ;
+ author+ Iocat lonCinty String
,. ,- ,- / + author --I" + oc~atontate String
+l--ocationCountry String
+- cfconeieneDate String



description rString me Sring oualNam String
programln Stng Number String
eprimarylnvet ators Resea cher[] + +pageNumbers String
+experimen otocols Experimental rot cl[]" ''
startDate It\
endDate D te Experiment[] GraduateResarch
conductedA eseachFaciltV[] I e n .
t Crop[] degreeType String
opTpesS d Cr[] \I [ E-partm nt String
plants Pla,[] yName String
chambers U.d Chamber[]
environ mer 5 points Setpont1[]
nutrientSol is Used NutnentSolutlon[] Re it TecfhnalMemorandum
SsolutionSet s Setpolt[] i Memumber Stnn
grohCha De StudyTypes urovithChambei JvT ell + n:: Te Me Nmber String
reserher ents Docment[] ion Stng
isDataValid led String

Experimental Protocol

+chamberNu nb- String I + objective Strng



+. _ltivars Cultivar[]

+ descripto String + description String resear erCommes moment[]
Irt n utn t _au + measuremtir UitTy S trn r \\u\l\ v

sensors enSor[] +name String

/-- +trayPosFtlon it oif CropType Crop
-- +tark Loc tedl +r g~earcherComments Document]
NutrientSolution + plantlnTray Plan i[i.
+ description String
+ oltonlDNu mber String St



+IanksUsedln Tank[\



+startDate Date Tme Observation + researcherPlantComments Document[]
+startTime Time hour Integer --- +date Date + oultivarType Cultivar
+endDate Date minute Integer ime Time +plantlDNumber Integer
+endTime Time +second Integer + observedValue String + harvestMeasurements Observation[]
+ setpontalue trying String

Figure 2-3. Object relationships in the ALS object database UML class diagram





Superclass and Subclass

Classes can inherit attributes and behaviors from parent superclasss) objects.




Extensible polymorphism allows classes to define new attributes for themselves (and

their subclasses) and overload identically named attributes and behaviors with updated
their subclasses) and overload identically named attributes and behaviors with updated









definitions. An ALS example: a Biomass Production Chamber is a subclass of Gi ,iI th/

Chamber.

Array dimensionality

Object arrays allow UML classes to request either one or many object instances of

each defined attribute or behavioral method. One-to-one, one-to-many and many-to-many

relationships are crucial to a schema designer's ability to accurately portray an

information domain with UML diagramming tools. From the ALS database: each

Experiment involves one or more (i.e., many) Researchers.

Containment

Especially difficult to model with relational flat-file schema design tools,

containment elegantly encompasses those interobject relationships often described as

"contains," "has" or "is made of" in object models (Loomis 1995). For instance: each

Gi II, thI Chamber contains one or more Nutrient Solution Tanks.

Java Class Generation

As previously mentioned, UML was selected as the object modeling platform

because of the ease with which its class structure can be transferred directly to object-

oriented source code. Likewise, Java (2003) was selected as the application programming

language due primarily to its bytecode execution method allowing operating system

platform independence, and its proven suitability for development of an Internet-ready

application.

The process of generating Java source code files from the initial UML diagram is

straightforward. ObjectStore handles the conversions completely, transforming each

UML class into its own Java source file of the same name. For example, the Researcher

and Experiment classes have now become Researcher. java and Experiment. java, as











Researcher public class Researcher {
_______ public Researcher() {.
1 }


Experiment public class Experiment {
public Experiment() { .



Figure 2-4. Generating Java source code from UML class diagram


pictured in Figure 2-4. All new Java objects arrive in the world with their attributes and

interobject relationships intact.

Implementing Object Persistence

The creation of Java source code alone does not allow Java objects to be

persistently stored in permanent memory. A typical wish is to store these objects to a re-

writable, non-volatile storage medium such as a hard disk. As with most programming

languages, variables and objects created at the time of program execution last only as

long as the program remains in the computer's physical memory (i.e., RAM). Once the

program completes its execution, all values of variables and instantiated objects are lost

and then garbage-collected. Implementing object persistence for the newly created Java

class types requires a post-processor, generally provided by the object database vendor.

Post-processing the compiled binary Java files adds provisions for serializable object

transport. In other words, class instances can now be written to memory and stored

independently of a program's execution. This seemingly transparent object storage layer

is referred to as the object database. Note that the choice of object database platforms ties

the designer to a particular object-oriented programming language. Since this project

relied on a Java/ObjectStore PSE Pro combination to begin with, switching programming









languages (e.g., C++, Smalltalk) is not advisable, nor is altering the object database

vendor (ObjectStore PSE Pro) without compromising any data instances already housed

in the object database. The database schema itself, however, can be moved to another

application platform if necessary.

Here it is important to note once more the difference between object and relational

database persistence requirements. Relational databases are seen by designers as stand-

alone entities, capable of being accessed by various programming languages using the

common SQL standard. They are already persistent by nature, as they are composed of

fixed tables of data records and require an SQL-based interface to retrieve their set-based

contents. Object databases, on the other hand, require no transformation code to

reconstruct their contents. Persistent objects, once located in the object database, can be

acted upon just like any other program object (Loomis 1995). This feature eliminates the

need for an additional data access layer between queries and objects. Now that the ALS

object database has been properly modeled and implemented in native Java code, the task

of generating an even higher-level layer of domain knowledge can be approached, as

discussed in Chapter 3.














CHAPTER 3
ONTOLOGY DEVELOPMENT FOR ALS

One motivation for developing an ALS-specific ontology-an explicitly defined,

machine-understandable hierarchical specification of a shared conceptualization-stems

from the ALS research team's desire to exercise greater control over their experimental

data. By constructing an ALS domain ontology, it may be possible to capture enough

background information about ALS research to allow both the ALLSTAR application

and future software development efforts the chance to assist ALS researchers in

examining their own data from perspectives not previously possible.

It is believed that the amount of data being collected by NASA research teams now

exceeds the ability of the professionals in the field to process, manage and study

(Campbell 1987). The ALS plant biology experiments are manageable in scope-far

more so than other NASA-related data sets such as 2-D geographical information system

(GIS) earth science plots continually generated by satellite observations-but are not able

to be fully analyzed due to their complexity of structure and parametric dimensionality

(ALS Survey 2003). An ALS domain ontology can serve as a tool to enhance database

queries and promote future efforts involving machine readability of its extensive

underlying experimental data.

What is an Ontology?

The term ontology was borrowed by computer scientists from the field of

philosophical metaphysics, where it originally referred to a theory of organization of

entities and their ties within a distinct system. Although there is no single way of









organizing concepts within an information domain, an ontology attempts to specify a

common vocabulary between different systems (Beck et al. 2002).

An ontology accomplishes its task by means of establishing a formal (i.e., machine-

readable) hierarchical structure of vocabulary terms associated with a particular domain.

The ALLSTAR project, for example, seeks the development of a domain ontology, which

restricts itself to only those concepts, theories and vocabularies specific to a particular

domain, the ALS research involving BLSS components and development. If interpreting

documents such as ALS experiment data or result findings requires prior knowledge to

understand their contents, the ALS domain ontology can be considered as the background

knowledge (Euzenat 2002).

Although official definitions vary, most ontologies share the following

characteristics: shared (accessible across multiple platforms and disciplines of

knowledge), formal (readable by both humans and machines), and explicit (follows strict

tenets of at least one standardized ontology language, as discussed shortly). Knowledge

in an ontology is typically organized in a taxonomic fashion and specified using five

general components: concepts, relations, functions, axioms and instances. The ALS

domain ontology used in this study, however, leaves the task of object instance storage to

the object database discussed in Chapter 2. Figure 3-1 shows the relationship among the

ALLSTAR application components discussed thus far.

Ontology Language Selection

Several prominent description languages suitable for ontology development

presently exist. Although most of these languages are similar in syntax and purpose, each

represents a different preference for the level of detail with which knowledge should be

represented. Regardless, all ontology languages parallel object-oriented ideologies












Relational Object Domain
Database Database Ontology
Database




Figure 3-1. Communication among three primary ALLSTAR components


mandating extensibility through resource sharing. It is likely that these standards will

evolve rapidly with time, possibly merging into fewer resulting languages as their usage

increases.

Unified Modeling Language (UML)

In 1997, the Object Management Group (OMG) released the UML 1.0

specification, which has since undergone several revisions and version updates. As

discussed in Chapter 2, UML usage is highly appropriate when it is the designer's

intention to map UML classes directly to programming code or object database elements.

UML draws criticism, however, from its use in representing more formal models such as

ontologies. These opinions are due in part to UML's lack of contextual associations

(properties) and concern for computational complexity of runtime reasoning (Kogut et al.

2002). UML is capable of modeling object class properties; however, it does not

currently allow interobject relationships-a plant grows leaves-to carry over to other

objects with the same relationship name-a flower grows petals. Each instance of the

relationship must be independently stored and created, and no simple mechanism exists

to override this limitation. Current OMG research into an improved UML specification,

based on their Model-Driven Architecture (MDA) standards, may remedy this

shortcoming and allow future UML versions to gain ground as a viable ontology









description and modeling language. In the meantime, however, the aforementioned

limitations combined with a lack of support for multiple inheritance (i.e., a class cannot

have more than one immediate superclass, a problem for modeling even such simple

concepts as "Carbon dioxide is a molecular structure and a measurable ALS parameter of

interest") have ruled out UML as the ontology language of choice for the ALLSTAR

database application project.

Resource Description Framework (RDF)

RDF (2001), as shown in Figure 3-2, is a subset application of the Extensible

Markup Language (XML) standard and intended as a foundation for processing

metadata-data that describes other data. RDF Schema (RDFS) is analogous to the XML

Schema research seeking schema-specific standardization of RDF-encoded information.

Likewise published by the W3C working group, RDF requires its metadata authors to

designate at least one underlying schema that the ontology makes initial reference to.

These underlying schemata can then be shared with other designers and extended upon in

hopes of ultimately constructing useful, trusted, sharable sources of domain-specific

knowledge (Lassila 1998).


Figure 3-2. Subsumption relationships among ontology language standards











Previous to RDF's release, the namespace of attribute names and the structure of its

values went uncontrolled, easily allowing two designers to inadvertently assign non-

identical names and value types to what should be identically modeled concepts. For

example, the ALS model may include a Crop class as having an age attribute of type

FloatingPointNumber, but perhaps another research team models an AgriculturalCrop

class as having a daysAfterPlanting attribute of type Integer. Both approaches are valid,

but the two models cannot easily communicate without manual intervention to map

related classes to one another. RDF, although not requiring use of a central namespace

registry, hopes to facilitate interaction between XML+RDF document designers and

encourages extensibility of existing domain models to promote standardization of

class/attribute pairs (Lassila 1998).

The DARPA Agent Markup Language (DAML)

The Defense Advanced Research Projects Agency (DARPA) began the DAML

project (van Harmelen et al. 2001) with the intention of facilitating the W3C working

group's vision of a future Semantic Web, essentially a gradual transformation of Internet-

accessible content-web sites, multimedia, embedded knowledge-from the present

HTML standard to one of the more descriptive standards derived from XML. A major

extension of function DAML adds to its XML+RDF foundation is its ability to derive

logical conclusions not explicitly stated in the DAML document itself. Allowing the

equivalence of identifier terms (e.g., ageOfCrop and daysAfterPlanting) along with

distinction of unique terms furthers the goal of DAML to provide additional

expressiveness over its predecessors (Heflin et al. 2001).









Ontology Inference Layer (OIL)

The OIL description language specification (2000) was released by On-To-

Knowledge, another consortium of university and industry researchers bent on facilitating

computerized information exchange. Rather than an attempt at consolidating previous

language efforts, OIL comprises multiple layers of ontology design and understanding

specific to Web-enabled access to and sharing of contextual content. Each successive

information layer is accessible by a specific OIL application, but the design is such that

lower OIL levels-those with less content description and expressiveness-can be at

least partially understood by any OIL-ready application. As with DAML, XML+RDF(S)

is the core foundation of OIL. The layered architecture of OIL paired with its similarity to

DAML project efforts ultimately led to a merger of modeling concepts, called

DAML+OIL (sometimes referred to simply as DAML). In turn, DAML+OIL became the

foundation for yet another W3C working group revision, the Web Ontology Language

(OWL), currently under development.

Advanced Ontology Operations

Modeling knowledge in the ALS domain is only the first of several future roles

intended for the ALS ontology. Figure 3-3 illustrates the pyramidal hierarchy of ALS

ontology development leading from its initial knowledge model, which is completed, to

its eventual acceptance as a trustworthy source of scientifically proven information

suitable for cross-disciplinary machine learning usage (Li 2002).

Domain logic can be derived either from statements expressed in a specialized

ontology language (e.g., DAML+OIL) or by means of an application capable of

extrapolating logical statements from an underlying semantic organization. Ideally, any

ontology would be concise enough to allow its full set of inherent logic to be derived



















Figure 3-3. Advanced and future roles for developed ALS ontology


while minimizing redundancy within the ontology itself. An ALS-specific example might

be a program capable of automatically classifying new experiments based on their known

characteristics and attributes.

A richly expressed ALS ontology and its companion object data set allow for the

possibility of automatic theorem learning via traditional machine learning techniques:

backpropagation-based artificial neural network (ANN) algorithms, inductive inference

by decision tree analysis (e.g., C4.5, ID3), and non-parametric statistical techniques for

histogram data sets which cannot be assumed to follow a Gaussian (normal) distribution.

Through methods such as these, the ALS ontology and others can provide a basis for

future development of accelerated automation of proof testing and learning. For example,

an ALS-related program might suggest crop growth model equations based on previously

observed plant responses to controlled input parameters.

Regardless of the level of detail of an ontology, its contents may fall under heavy

scrutiny if it originates from a questionable source. Evolving standards in digital

signatures and assigned statistical probabilities of individual constituent knowledge

components will likely be the forerunners in the tools used to convince the scientific

community and general public of the validity and usefulness of an ontology's modeled

knowledge. Today's search engines and site indices, for example, cannot limit searches to









trusted data, nor can they attempt to understand the context or meaning of the data they

scan (Lopatenko 2001). Familiarity with the Internet's propensity for rapidly spreading

ambiguous or false information-news rumors, urban legends, celebrity gossip-should

hopefully deter researchers from too quickly accepting results returned from conventional

search engine queries. Scientific literature and research traditionally have not been

viewed as fields rife with falsely reported results or malicious self-interest; however, the

proliferation of electronic resources such as domain ontologies available on the Internet

could be seen as potential breeding grounds for a host of ill-intended information sources

to reside in. The respected scientific process of peer review would falter if trust and

certainty levels cannot be successfully established for ontology-based knowledge sources.

An equally unfortunate scenario would be if multiple untrusted information sources lead

to limitation of their access and researchers' subsequent consensus on flawed concepts,

having not been able to adequately share their research with the scientific community at

large.

Implementation of ALS Ontology

Although all of the previously mentioned ontology description languages are non-

proprietary open standards, academia and industry have yet to agree on which tools to

standardize upon for production environments. Since the success of an ontology standard

somewhat broadly depends on its general acceptance and usage, the jury remains out on

which family of description languages-if any-will in time prevail. Not everyone is

convinced that the Semantic Web is a viable goal, however. Critics point out that in order

for the idea to proliferate, independent researchers and ultimately the general public will

need to retrofit their existing HTML or replace it completely to allow true semantic

content analysis by a machine. These computerized Web Agents likely will not feel at









home on the now-dominant HTML portions of the Internet, which are designed for

interpretation by humans. Maximizing machine readability of data may come at the cost

of sacrificing oft-abused HTML markup entirely as a means of Web-based document

storage and communication (Clark 2003).

As the ALS ontology is primarily intended to serve as a source of assistance for

database queries (by humans or machines) and only as a starting point for a complete

domain-level ontology later suitable for interdisciplinary research sharing, no particular

ontology language thus described was an obvious favorite. This observation furthered the

selection of an in-house departmental UML-like design environment, Web Taxonomy

(IFAS Information Technologies, University of Florida, 2000), for the ontology

construction. The design environment is capable of XML input/output (for cross-platform

data transport if required) and is able to serve as a graphical ontology navigation and

query tool. Using generic language tools instead of one of the still-evolving ontology

language standards may help lengthen the shelf life of the ontology and prolong its

usefulness for future software development and information sharing efforts by the ALS

research staff.

In a similar but more exhaustive procedure than the object database design

discussed in Chapter 2, creating an ontology requires meticulous consideration of the

types of objects (classes) to be included in the modeled ALS information domain. A

practical starting point was to use the two dozen classes already defined in the ALS

object database (e.g., Experiment, Crop, Researcher) as a foundation to build upon. ALS-

related publications-experiment protocols, results, technical memorandums, and









equipment descriptions-and interviews with ALS staff researchers provided the

remaining details to be embedded within the ontology schema.

The manual classification and insertion of the collected ALS terms included

consideration of all conceivable synonyms or alternate phrasings. For instance, LED,

light-emitting diode, and LED lamp all describe the same ALS concept. The graphical

depiction of ontology terms as a hierarchical grouping of related concepts was made

possible by the Web Taxonomy application itself. This mode of direct visualization

helped facilitate the rapid construction of ontology branches, as illustrated in Figure 3-4.


Key Iparameter Concept I


barometric pressure <
electrical conductivity
photoperiod 4
thing parameter 4 photosynthetic photon flux 4
photosynthetically active radiation 4
relative humidity 4
temperature 4
Figure 3-4. Web Taxonomy as an ontology visualization and editing tool


Likewise, multiple hierarchical levels of taxonomic classification allow terms in the

new ALS ontology to more naturally represent real-world concepts. Navigation along the

subsumption (superclass/subclass) relationships between terms is straightforward, as

depicted in Figure 3-5.

For the purpose of the ALLSTAR database application project, the ALS ontology

can be used to yield a set of documents derived from either the underlying relational or

object databases. Future ontology analysis layers, however, such as the proof, logic and

trust modules discussed earlier, may allow the ALS ontology to provide a trusted answer

to a natural-language query posed by a human or machine client.


1 : [n] ny of a et of phsical prperties











ALS experiment
atomic element
blss 4
environment 4
environmental parameter
light source 4
molecular compound <
nutrient delivery system 4
organic matter
plant 4
sensors


Earth surface 4
Mars surface 4
Moon surface 4
plant growth chamber 4


artificial light source 4
sun 4



organic plant matter
agricultural crop 4
weed 4


Figure 3-5. Multiple levels of taxonomic classification in the ALS ontology


thing 4


bulb
fruit
leaf 4
pod 4
root
seed
shoot
tuber4














CHAPTER 4
ALLSTAR APPLICATION SOFTWARE DEVELOPMENT

With the ALS object database model and domain-level ontology completed, the

next step in the development of the ALLSTAR project was the design and

implementation of a prototype application capable of storing, accessing, querying and

visualizing the combination of data sources in an understandable and useful fashion. As

the object database model was written using UML tools, it made sense to choose a

corresponding object-oriented programming language to uphold the desired transparency

between the application and the underlying database. Between C++ and Java, the two

most prominent object-oriented application programming languages presently available,

Java was chosen for its demonstrated ease of developing Internet-ready applications and

its cross-platform execution ability. Graphical user interface (GUI) and data visualization

components were constructed using Java's Swing class library, client/server interaction

was implemented via Java's Remote Method Invocation (RMI) library, and deployment

integrity assurance was handled by Java Web Start (JWS). Each of these topics is

discussed separately below.

Application Design Considerations

Software engineering guidelines predict that the more time that is spent on the

design phase of an application project, the less time is needed in its final adjustments and

maintenance phases. A number of design approaches were attempted and subsequently

discarded; therefore, the current ALLSTAR version reflects only those programming

concepts and algorithms best suited for integrating the primary goals of the project:









expanding the capability for human researcher access to the ALS data through a new

query engine, and establishing a sound foundation for future development efforts related

to machine readability and knowledge-sharing capability of the ALS information domain.

Query Processing

The Java source code classes created from the UML schema as discussed in

Chapter 2 have no interobject query capabilities of their own. Instead, they rely on

whichever object query language (OQL) implementation that has been established by the

vendor-specific object database software. In this case, the ALS object database is served

by the ObjectStore PSE Pro variant of OQL. It overlaps and is therefore said to be

compliant with the object storage industry standards promoted by the Object Data

Management Group (ODMG), of which ObjectStore's parent company is a member. It

should be noted that the similarities among query languages of different object database

platforms indicate a smooth transition if ever required, rendering negligible the concern

for maintaining a non-proprietary solution whenever possible.

The default set of query tools included with the object database software required

the manual programmatic extension of its capabilities to meet the ALLSTAR application

goals; specifically, ALLSTAR needed to use the attributes and relationships of each class

as navigational tools for graphical query construction. Also desirable was the ability to

dynamically filter out unwanted attributes specific only to the internal object storage

mechanisms and not needed for display purposes. A final feature sought was the utility

of being able to string together limitless object queries of various types, a cornerstone

ALLSTAR feature made possible only by the evolution from the legacy relational

database to the newly developed object model. To implement these improvements to the

default query tools, the required subroutines were coded in a previously undefined ALS-









domain object, arbitrarily called RootObject. By then modifying the original object data

model, each ALS object class can trace its superclass roots back to RootObject along its

inheritance path. Doing so allowed any newly defined ALS objects to inherit behavior

provided by the customized query processing engine. If these improvements were instead

implemented in the main executable code sections, individual ALS object instances

would be deprived the ability to attach themselves to a query-in-progress by examining

their own contents-attribute/value pairs of any data type, primitive or user-defined.

For an ALLSTAR client, a database query can now appear to be a transparent layer

atop movement (either in tabular or graphical format, as explained in a later section)

through the structure of the database itself. A user may associate this navigation with

zooming in-or drilling down-from a general concept to a specific data subset sought

for analysis. With the exception of the information contained in the relational database

(e.g., growth chamber measurements, nutrient solution measurements, harvest data), the

object database contents become the data of interest. Instead of flat-file relational entities,

results of a database query now entail viewing the properties and relationships of one or

more objects that match a specified search term or concept.

Consider the query example from Chapter 1, Figure 1-3: again a researcher wishes

to extract a list of all ALS experiments conducted by primary investigator Gary Stutte

that involved wheat as a candidate crop and seed positioning as a primary study type.

With the new query tools in place, however, prior knowledge about the structure of the

object model is not needed. Instead, a researcher can make use of the semantic model

embedded within the object classes to intuitively navigate to an answer (Figure 4-1).









Since many ALS-domain objects contain attribute values of non-primitive types

(i.e., not concrete data types such as string literals, integers, dates or floating point

numbers), the resultant interobject database navigation often yields additional objects of

relevance or interest to an ALLSTAR user. Previous queries based on the legacy

Researcher
firstName
lastName ----- --- Find: Stutte
emailAddress
Experiment workPhone
primaryInvestigators/ department
description -and-
startDate
endDate
cropTypesStudied -Crop Fi
studyTypes commonName F wheat
... cultivars

-and-

Chamber Study Type
description -- Find: seed
positioning

Figure 4-1. ALS domain database query with object-based navigation


relational database are restricted to requests for information based only on the relational

entity's attribute names and data ranges. Object-based queries empower a user with the

ability to pose any series of questions using keyword matches related to the object model

and its higher level of intuitiveness. To consider the usefulness of object-based queries, a

brief list of sample queries is shown in Table 4-1.

Table 4-1 illustrates several key concepts. First, the object query suffers the

limitation of not being able to query the relational data, as it is stored separately and is

accessible only via traditional SQL queries. Data-specific information such as particular









numerical ranges of oxygen pressure measurements cannot be included in an object

query. Relational data is accessible but not queryable using the ALLSTAR program.

Only objects in the object database-and thus defined in the object model-can be

included in an object query, revealing a considerable limitation of the decision to keep

some ALS data separate in its original relational form. This obstacle prevents, for

instance, the ALLSTAR application from being able to process the last query listed in

Table 4-1. Database queries in the ALS domain
Implementation
Sample query1 Method
Find all ALS crop growth experiments involving wheat. Relational or
object
Find wheat experiments in which harvested plants exceeded 0.5 m in Relational
height.
Find experiments whose primary investigators) published journal Object
articles) involving potato growth
Compare the gas exchange measurements between the two levels of Relational or
the BPC2 for lettuce experiments conducted in both levels object
simultaneously
Determine a correlation coefficient between the variance of an Neither
experiment's recorded harvest data and the average annual number
of journal publications written by the experiment's primary
investigator


Table 4-1. Despite the unconventional nature of this query and its likely irrelevance with

regard to ALS experiment analysis, the importance of being able to formulate arbitrary

database queries is here again emphasized. Traditional query types should not be allowed

to wholly dictate the ability of the query engine. Future efforts in automated theorem

proving can take advantage of this query flexibility and possibly demonstrate unintuitive

1Note that processing of natural-language syntax queries is not supported in the current ALLSTAR
version. Keyword pattern matching is used.

2 Biomass Production Chamber. The BPC is the centerpiece of the ALS plant biology experiments at the
Kennedy Space Center.









correlations among seemingly unrelated components. Tilting at windmills, chasing

rabbits, and other indelicate descriptions of placing undue emphasis on irrelevant

correlations discovered in small data sets remain a problem best suited for human

researchers to sift through. Several ALS staff researchers have indicated concern

regarding ALLSTAR's potential for generating a wealth of tangential suggestions related

to data set correlations, all of which would need to be analyzed further if even one can be

ultimately proven correct (ALS Survey 2003). Uncertainty analysis measurements of

completed ALS experiments can now be integrated with additional ALS-domain

knowledge-manufacturer specifications of sensors, growth chambers and nutrient

solution tanks, for example-made possible by the flexibility of the underlying object

query processor.

As discussed in Chapter 3, the ALS domain ontology serves two roles: a schema

providing data types for annotation of existing content, and a semantic context providing

background knowledge of a specific domain (Euzenat 2002). The first of these roles

holds direct significance to the ALLSTAR query processor; the second role serves only

as a backdrop for future ALS-related software development. When a user-human or

machine-inputs one or more search terms, the contents of both the domain ontology and

object database can be iteratively searched for matching patterns. Should a term match

the name of an object database class, the user is directed to the database node containing

the result object type(s). If no database match is found, the ontology then steps in as a

reserve source of information to further the query. Synonyms, partial keyword matches

and related terms can all be analyzed for possible matches with existing object model

classes. If still no matches are found, the user is encouraged to browse further for a









possible link to the information sought. This approach seeks to minimize the amount of

background information-i.e., expert knowledge-a user must expect to exercise during

a typical database search-and-retrieval operation. To briefly illustrate one potential aspect

of this query assistance approach, consider the hypothetical synonym lookup example

shown in Figure 4-2.




( Enter a search string to look for:
Ilight-emitting diode

OK | Cancel|


Object Database ALS Domain Ontology
"light-emitting diode"
No matches found. Found: light-emitting diode
4 Synonym: LED lamp
"LED lamp"
No matches found.

"LED" Synonym: LED
"LED"
8 Protocol matches
10 Parameter matches
6 Journal Article matches
56 Experiment matches


Figure 4-2. Sample synonym lookup in ALS domain ontology


Database Navigation

Two primary modes of database traversal have been included in the ALLSTAR

prototype application: a more conventional tabular style, and a somewhat experimental

graphical navigation system. Screenshots are provided in Figures 4-3 and 4-4.

Aesthetics of both methods obviously vary, and while the ALLSTAR application is

not intended as an exercise in commercial-quality GUI design, advantages of each















-EX le i. *t x


Re-sort these results by:


Description


Select from the 76 Description choices) below:.


Effect of inoculated system for processing graywater by hydr ....
Effect of media size & NDS management growth of Super Dwarf ..
Effect of osmocote concentration on germination & growth of ....
Effect of oven-dried feed compost leachate on growth and yie ....
Effect of oven-dried feed material on nutrient recovery from ...
Effect of processing of compost leachate on growth of wheat
Effect of seed position, culture, and NDS Pressure on Germ o ....


Figure 4-3. Conventional tabular style of object database navigation


Images
Identification Code
Description
Program Involved

Experimental Protocols
Start Date
End Date
Conducted At
Crop Types Studied
Cultivars Studied
Plants
Chambers Used
Environment Setpoints
Nutrient Solutions Used
Solution Setpoints
Growth Chamber StudyTypes
Researcher Comments
Is Data Validated


Images

First Name
Department
Email Address
Work Phone
Experiments Involved With
Publications


Figure 4-4. Graphical object database navigation




navigation method can nonetheless be intuited. The tabular navigation style ensures that


all relevant attribute data (every existing attribute value) will be displayable inside a


single on-screen container object. A user should therefore expect little trouble searching


for a specific value or range, as each attribute value list is alphanumerically sorted, using


Chamber
Crop
Cultivar
Document


Nutrient Solution
Parameter
Researcher


Or choose:
Taylor
Thomas
Tibbitts
Tremor
Troendle
Venditti
Weigel
Wheeler
Wilson
Wright
Yorio









a modified Quicksort algorithm (Hoare 1962) to enhance application performance. Users

choosing the graphical navigation style, however, cannot be assured of the on-screen

location of their sought attributes or values, since these locations depend on the

components of the query already in progress. The strength of the graphical method of

database traversal lies in its ability to allow at-once visualization of a larger portion of the

object database model and its object interrelationships. Studies of human cognitive

capacity suggest that a person's task analysis abilities can be enhanced by grouping

relevant information-logically related database objects, in this case-and that this visual

grouping can decrease the time spent understanding a query in progress (Lee 1993).

On-Screen Visualization of Data

In an effort to minimize the potential confusion of navigating an object database,

object instances are shown by class name and attribute/value pairs. To decrease screen

clutter, relationships are not explicitly shown. Clicking on an ALS-domain attribute value

expands the view to include the new objects) of interest. Figure 4-5 shows a database

object instance-of class type Parameter-illustrating its tabular view of attributes and

corresponding values.

To show the potential for an integrated Java application capable of object database

access and relational data display, ALLSTAR allows graphical and tabular displays of


Description Dimethoxydimethylsilane
Abbreviation DIMETHOXYSI
Measurement Unit Type ppb
1 of 1 Parameter results)


Figure 4-5. Tabular display of object instance in database













relational data as found in the parametric sensor measurements of each growth chamber

and nutrient solution tank. The use of JFreeChart (2003), an open-source Java class

library of charting and graphing tools, as a vehicle of graphical data display further

exemplifies Java's ease of class extensibility. The ability to include third-party source

code libraries when appropriate and their subsequent integration into the project code

typifies Java programming projects such as ALLSTAR. Figure 4-6 briefly shows a

graphical view of ALS growth chamber temperature measurements, using the integrated

JFreeChart library.

23.25
23.00
22.75
22.50
322.25
22.00
EL 21.75


21 .2o











Relational data from the ALS database-including 5-minute growth chamber

sensor readings, nutrient tank readings, and harvest data-can also be exported from

ALLSTAR in a tab-delimited format for use in external applications if higher-level

analysis is needed. Likewise, visual graph plots can be saved as image files. All graphs

allow threshold selection via mouse-based zoom controls. By default, daily average

values are initially displayed for 5-minute sensor data due to database size considerations.









Graphical User Interface Considerations

From a human researcher's perspective, the ALLSTAR prototype application can

maximize its usefulness by allowing different modes of access to every data record-

object or relational, single-record or data-subset-and by displaying multiple

simultaneous conceptual relations or records. Doing so allows researchers to "get their

hands around the data and results," potentially decreasing the time necessary to retrieve

pertinent results from the database itself (DeCoste 2001).

The ALLSTAR application uses customized extensions of standard Java Swing

components for its GUI construction, foregoing the need to build display components and

object containers completely from scratch.

Application Deployment Issues

The Java programming language presents its own unique set of deployment issues

necessary for consideration. Originally, the ALLSTAR project started out as a simple

applet. Over time, however, it outgrew the applet sandbox-the program execution

restrictions imposed upon all applets-and required a shift to become a full-fledged Java

application.

Security Issues

By default, all Java programs that did not originate from a user's own computer are

forbidden from carrying out many system-related tasks: disk read/write access, cut/paste

operations to the operating system clipboard, and remote access of network resources to

name several. These limitations were overcome by creating homemade digital certificates

(as opposed to fee-based, privately authenticated security certificates) and digitally

signing all Java files associated with the ALLSTAR application. Doing so allows the

program full rights to a user's system, assuming the user understands and agrees to a









mandatory security warning that appears before the program's first-time execution on a

workstation.

Instead of burdening ALLSTAR users with a large file download size or tying up

their CPU time unnecessarily, a client/server approach was adopted for the ALLSTAR

design and subsequent deployment. Java's Remote Method Invocation (RMI) library was

used to permit any ALLSTAR client program to connect to and communicate with the

centralized ALLSTAR server utility. The server handles incoming requests for data query

analysis or data table lookups, and the client concerns itself primarily with GUI elements

such as data display and graphing tools. An updated depiction of the ALLSTAR system

from Figure 3-1 now appears in Figure 4-7.

Compatibility Issues

Multiple Java runtime environments exist, one for every major operating system

currently in use: Microsoft Windows, Sun Solaris/Unix, Linux, and Apple Macintosh OS

are all supported. Cross-platform execution ability was among the features originally

influencing the selection of Java as the ALLSTAR application programming platform.

Although the application is designed to be accessed by any standard Web browser

program, the Java Web Start (JWS) utility was used to reduce the chance that a client

user's particular browser configuration would interfere with the ALLSTAR program's

proper execution or on-screen display. JWS allows an application programmer to forcibly

require a particular version of the Java Runtime Environment (JRE)-in this case, v 1.4.1

or higher-before the program will open. JWS also frees the Java application from the

browser entirely, allowing a user to close any current browser window sessions and still

maintain connectivity with the ALLSTAR application. The combination of these JWS









features help showcase ALLSTAR as a functional standalone application despite its

status as a lightweight RMI client program.


I I

Relational
-Database


I I I


Object
Database


Figure 4-7. Communication among the RMI client/server application components














CHAPTER 5
RESULTS AND DISCUSSION

The complexities involved with the integration of ALLSTAR Java programming,

UML modeling, and ontology design should not be allowed to overshadow the project's

relatively few stated goals: assisting both human and machine readability of and access to

the ALS data. The next sections discuss each of these objective's results separately, using

ALS staff feedback and general observations to explain the perceived advantages and

disadvantages to the approaches taken throughout the ALLSTAR project. ALS researcher

comments are the product of regular e-mail feedback, semi-annual on-site interviews, and

other personal communications conducted over a two-year period. Additional opinions

stem from solicited but anonymous ALS staff responses to an ALLSTAR user survey

primarily intended for ALS staff members (Appendix C).

Human Researcher Access of ALS Data

The first of the two primary ALLSTAR goals is the design and implementation of

an environment capable of facilitating ALS researcher access to both their existing

relational data in addition to any supplementary knowledge encapsulating the ALS

domain in general. It was initially believed that the creation of an object model, if

combined with a method (in this case, ALLSTAR) of directly accessing the model could

ease a researcher's task of gathering and analyzing data records of interest. ALLSTAR

does successfully implement its intended functions as described above; however, ALS

staff reactions have thus far been mixed, as explained in the next sections.









Application Performance

The ALLSTAR application, although intended only as a lightweight, Internet-

accessible database access/query interface, nonetheless pushes the limits of what is

typically expected by a user of a lightweight, Internet-accessible Java program. Presently,

it is not yet commonplace to download large Java programs that use windowing systems

similar to most graphical operating system environments. Java is most likely familiar to

Internet users as a web applet development language, powering small applet programs

that are typically embedded within a larger HTML-based web page. It is therefore worth

mentioning that although the ALLSTAR user interface resembles a typical window-based

application, its client/server architecture relies upon network bandwidth to run smoothly,

since most user interactions require communication with the remote database server. ALS

researchers are already accustomed to delay times (1 to 5 minutes is not unusual)

associated with querying their large-scale relational database. As such, researchers did

not appear to be fazed by the frequent but short time delays (typically 5 to 30 seconds)

incurred by many in-application commands and data requests.

Java is largely platform-independent, as program execution environments (i.e.,

Java run-time environments) have been developed for most commercially available

operating systems currently in use: Microsoft Windows, Sun Solaris/Unix, Linux, and

Apple Macintosh OS. Lack of reliance on an operating system comes at a price, however.

Java's bytecode structure (a compromise between original source code and completely

compiled binary files) requires the use of additional CPU operations to finish the

compilation-on-the-fly process. The result is that Java application performance can vary

considerably according to a workstation's CPU speed and memory specifications.

ALLSTAR is no exception; both its command response and screen redraw/refresh issues









were noticeably affected by different CPU/memory configurations using identical

operating system versions. Certainly a user's first impressions of ALLSTAR can be

marred by sluggish menu responses or display glitches, but it is believed that more robust

programming techniques may allow the application to depend less on CPU/memory than

does its current version. No ALS staff researchers mentioned application response time as

being a significant hindrance to using the program.

Online Accessibility of Data

ALS researchers were mostly in agreement in acknowledging the usefulness of

having their experimental data sets (and the newer object model) available online for in-

house use. Opinion was split, however, on whether any or all of the raw data observations

should be made publicly available. Concerns over the need to validate experimental

data-checking for equipment errors, handling data outliers-before its public release

were understandable. Of additional concern was the potential for misappropriation of the

ALS data by others: lack of proper citation, disassociation of a primary investigator from

a particular experiment, and conclusions drawn using invalid analysis techniques.

Counterarguments from the ALS staff nonetheless urged the importance of making as

much validated data available online as possible, if only to set an example for other

research environments.

Several ALS staff researchers agreed that using an object model in conjunction

with an ontology knowledge layer (i.e., using ALLSTAR) to query their database records

was a preferable method to their existing query tools. Others disagreed, blaming the

original relational database model for past suboptimal query performance, suggesting a

database remodeling process could benefit little from migrating to an object-oriented

platform. Staff sharing this opinion also speculated upon the small size of the ALS-









domain concept set (28 concepts are included in the UML object model) as being

insufficient to merit the development of a non-prototype version of ALLSTAR.

It should be noted that the ALLSTAR prototype development and subsequent

evaluation was conducted using only a subset (about 10%) of the complete ALS

relational database records, for purposes of server storage reduction and security

restrictions on transporting data outside of the KSC complex. In-house installation of

ALLSTAR at KSC was performed, however, to demonstrate the portability of the

ALLSTAR code across platforms and server installations. In this case, ALLSTAR was

shown to successfully query and retrieve results from the entire (100%) set of existing

ALS relational database records. This type of remote installation requires little or no

source code adjustment, and the ALLSTAR RMI server is compact enough to run

alongside other applications on a single dedicated network server.

Machine Access of ALS Data

If the benefits of ALLSTAR to data access by humans are difficult to measure and

clarify, attempting to measure the success of facilitating ALS data access by machines is

tougher and more abstract still. The ALLSTAR Java application itself poses little interest

to a machine-based client. Instead, the modular layers supporting the ALLSTAR

interface might be of some use to future software development efforts bent on knowledge

discovery and automated proof learning.

Having access to an object-based data layer that is itself partially populated using

a large underlying relational database should theoretically simplify a programmer's task

if a program capable of interfacing with the ALS data is desired. No applications yet exist

that are capable of directly interfacing with an unknown UML model and able to query

the persistent storage mechanism of an unknown object database layer. Likewise,









competing ontology language standards have hampered the development of applications

truly capable of interacting with two or more disparate domain ontologies and drawing

logical conclusions between them. Until these technologies mature further, the ALS

object database model and ontology layers remain underused resources, able only to

enhance the query abilities of human researchers interested specifically in ALS data.

Electronic Publication of Experimental Data

The field of scientific publishing is currently witnessing a rising interest in making

information available in electronic format. Some projects, such as the Open Archives

Initiative (2003), the Public Library of Science (2003), and the National Science Digital

Library (2003), additionally contend that the type of information available in peer-

reviewed or non-refereed scientific journals should be free of charge to all.

Indications of progress by a researcher in a particular field, in addition to

accomplishment of experimental findings for purposes of sharing knowledge or building

career reputation credentials, today remain largely defined by the quality and frequency

of a scientist's published works. It is unclear when this trend will shift toward

predominantly electronic publication, and hazier still is the long-term future of scientific

journals at all.

In the meantime, it is both unethical and illegal for a U.S.-based scientific journal

to attempt to lay copyright claim to experimental data records that form the basis of a

published report (17 US Code, 2002). An exception to this policy would be if an author

chose to print all of the data observations in the report itself, a possible scenario for small

data sets. A typical ALS plant growth experiment generates from 25,000 to 2,000,000

distinct observations; therefore, raw data inclusion in journal destined for paper

distribution remains unlikely. Instead, the ALS data becomes an excellent candidate for









electronic publication in one or more monitored forms. Subsets of ALS data have already

been successfully used to create multimedia educational tools suitable for high school

students (BioBLAST 2002). NASA recently renewed its pledge to develop and support

educational outreach programs, sparking institution-wide interest in generating software

tools and NASA-related curricula for use in K-12 and university programs (O'Keefe

2002). It is therefore likely that subsets of the existing ALS data will continue to be made

available in alternative formats (such as through ALLSTAR) for purposes of specialized

interface design and improvement.

Future Directions of Research

Although the current ALLSTAR application version is not intended for public use,

it provides a framework upon which to construct additional information and interface

layers. Future research related to the ALLSTAR project would likely include

improvements to the interface design, enhancing program code robustness for greater

application performance, and expanding both the object data model and the domain

ontology to include greater embedded knowledge. The first two research goals would be

applicable to a short-term objective of developing in-house or outreach materials that

seek to demonstrate principles related to ALS experiments. The latter goal instead

pertains to the more distant objective of readying the ALS data set for access by an

automated search utility-perhaps a web agent of the proposed Semantic Web.

Of great interest to this project is the potential for automated agents to one day

integrate ALS expert knowledge with the ALS data set-in essence, combining

rudimentary components such as the ALS object data model and the domain ontology-

to draw logical conclusions about the natural world. These decision support systems will

draw upon the large ALS data store to assist in tasks based upon predictive reasoning






54


algorithms. One module might help detect equipment failure. Another artificial

intelligence module might suggest a new ALS experiment to narrow the uncertainty of a

particular set of crop growth equations. Still another could help automate the task of data

validation and outlier detection. Ultimately, crop growth equations will merge with

system models and simulations. With the help of automated reasoning tools such as those

hinted at by applications like ALLSTAR, modeling a feasible bioregenerative life support

system for a given set of mission considerations will be less daunting a task.























APPENDIX A

ENTITY-RELATIONSHIP ALS DIAGRAM


This section contains the entity-relationship (ER) diagram for selected relational



entities of the existing legacy ALS relational database. All lines express a one-to-many



relationship (many is depicted by the symbol) between two attributes of two distinct



entities.




ID CROP HISTORY IDEXPERIMENT IDEXPERIMENT
TX COMMENTS ID CHAMBER ID CHAMBER /
IDTANK ID TANK
DMEASUREMENTDATE TX DESCRIPTION
.... -Ia TX COMMENTS I D START DATE
TREATMENT D _END_DATE
TXCULTIVAR A ID_CROP_HISTORV Y
TX CROP V ID_EXPERIMENT \ /
ID CHAMBER

/ TXCULTIVAR
DE M PLANTING DATE
TXPARAMETER HARVEST DATE ~ ~- |
ID TRAY IDEXPERIMENT
IDEXPERIMENT I PLANT IDCHAMBER IDCHAMBER
IDCHAMBER TX DESCRIPTION TANK
TX_CULTIVAR TX PRIMARY INV5T SAMPLEDATE
TCROP PRF N) SdLAMPLE
TXEPROGRAM IDLA
r _r n AD S T A R T AD A T E
W DMEASUREMENT_DATE I D END DATE
IDCROPHISTORY IDEXPERIMENT / CD GCS
TXPARAMETER I ID_CHAMBER T N_QTYTANKS
IDVALUE IDTANK
TX PARAMETERS
VALUE


D MEASUREMENT DATE
IDCHAMBER 2v- rj
TX PARAMETER D MEA5UREMENTRDATE
VALUE UID EXPERIMENT
IDMEASUREMENT IDCHAMBER
TXPARAMETER ID EXPERIMENTS
VALUE CHAMBER ri-rif
i.: SETPOINT ID CHAMBER
: PARAMETER IDTANK
,IDMEASUREMENT TVALUEMTER




TX PARAMETERME
IDCHAMBER TX PARAMETER
IDTANK TXDESCRIPTION ID CODE
TXSOURCETABLE TX UNITS PARAMETER
TXSOURCEFIELD TX CONSTRAINTS IDLABDESCRIPTION
DMINDATE N DEFAULT TX PARAMETERDIVR
NROWCOUNT TX USAGE N VALUEEVE
TXLOADFLAG TX NAME




Figure A-1. Entity-relationship diagram for existing ALS database









Each of the entities in the previous ER diagram are explained further in Table A-1.

Table A-1. Entity descriptions from existing ALS relational database schema


ALS CODES
ALS PARAMETERS

AUTO MEASUREMENT
CHAMBER HISTORY

CHAMBER ORGANIC

CHAMBER SP

CULTIVAR
CULTIVAR LIST

DAILY COMMENTS

EXPERIMENT

EXPT DETAIL

HARVEST

HARVEST COMMENTS

HARVEST DETAILS

LAB HEADER

LAB HISTORY

TANK HISTORY

TANK SP


Experiment study types, grouped by concept
Environmental parameters: descriptions,
abbreviations, measurement units
Sensor types, locations, parameters measured
5-minute-interval recorded sensor data (e.g., CO2)
from growth chambers during experiments
Periodic measurements of organic gaseous
compounds (e.g., ethylene) during experiments
Parametric setpoints for each experiment, specific
to a particular growth chamber
List of cultivars for each agricultural crop type
List of which cultivars were studied in a particular
experiment
Periodic researcher comments about special actions
taken (milestones, anomalies) during experiment
Primary investigator, crops, start/end dates and
descriptions of all ALS experiments
Similar to EXPERIMENT, but includes multiple
entries for separate tanks within growth chambers
Assigns unique ID code to each individual plant
harvested
Control/treatment description for each plant if
relevant
Fresh and dry biomass measurements for harvested
plant material, separated by location on plant
Assigns unique ID code to hydroponic nutrient
solution used in each tank of each experiment
Periodic measurements of chemical or nutrient
solution parameter from each tank solution
5-minute-interval recorded data (e.g., temperature)
from nutrient solution tanks during experiments
Parametric setpoints for each experiment, specific
to a particular tank within a growth chamber














APPENDIX B
UNIFIED MODELING LANGUAGE ALS DIAGRAM

Following is the complete UML diagram for the newly generated ALS object

database model, with dependency relationships depicted as arrows between objects.

For clarification purposes, enlarged versions of each class also follow. Standard

UML class diagram components (as shown in each figure) contain the class name,

attribute names and their respective data types. Note that the RootObject class

additionally has a defined method-searchFields-but that all other classes inherit this

customized method for database query purposes as explained in Chapter 2.










58





RooObject ____----- ALSlmage Book
images ALSImage[] AL-- SlemageBinary +title Strln +bookipters BookChap rr[]
+ boolean sear,.hFeldStrng ags) + uploadedlmageFileName String + d e. ri tn Strng
+ Image ALSImageBinary




"+ Iy e. r rr + ttle Strindn
+ rt rum n n g ,r + o rg a n tza o n N a m e S trin g
i irr +$+ r proceeding Full Name String
Str in +d rt a I + locatlonCity String
+ +thors I Str ng
:* I +locstlonCountry String
+ onerenoeDate String



nr ti o d.o String Ir JournalArtlcle

d "n Researcher[] r r g
Experimentas String

+startDate ,. t1 + la N m String:
andDt D ,t Experiment[] Graduate Research
S e.asearchaclty[] ,' degreeType String

ltvarsS d Cultvar[] erslty Department String

Sbhambe]sU d Chamber[]
por_ -t A roscl[t[]
S Usd1 NanntStln[] I Tec al Memorandum
s 6rpo t[ r ", tl] a aTei TMemoNumber Strng

Sts u DT m ]\V se"urlt y C'ssifcallon Stnna
rsD Dtuml mIhdS Crop[ di Str




/Chamber + \ \ texperlnmernt Experiment[]
+ -h---- I --- + .objective String




nor ] Tcorn mnName Siring

Iank )Y^ Parameter + studirddInExperiments Expenment[]
n+ description Sti resear eC nme s cumenit[]
'rr etro to Streng


N utie oluti utno +ttplutao.l+ meas ane-ntnimpe rn
+desor nptlo SirinString /g




St ptldng + description SStrinring
S, fCropType Cropn n
Chamber +eriU n m Exp erlm enti I



+' ----- e e ^+tankLocatedlrn ~ +res earcherComments Document[]
Nutrient Solutionm + Iart + Plunt[t,* l
+description Str ing
+ solution Num ber String
+tanksUsedln Tank[]


Setpoint\ tr- -rayLocatedl n Tray
+startDate Date -me -._ Observatlon \ t researcherPlantComments Document[]
+startTime Time + -hour Integer --n + date Date \ +cultlvarType Cultvar
+endDate Date +minute Integer l tlme Time \ plantlDNumber Integer
+endTime Time +second Integer +aobservedValue Sting +|harvestMeasurements Obseratlion]
+setpointValue String


Figure B-1. Complete UML diagram for ALS object database model





















ALSImageBinary

+ uploadedllmageFilelame: String




Book

+ bookChapters: BookChapter[ ]


BookChapter

+ book: Book


Figure B-2. Expanded descriptions of individual UML class diagram components


ALSImage

+ title: String
+ description: String
+ dateTaken: Date
+ image: ALSImageBinary


Chamber

+ chamberHumber: String
+ description: String
+ tanks: Tank[ ]
+ sensors: Sensor[]





Figure B-2 Continued


60


ConferenceProceeding

+ organizationtame: String
+ proceedingFulllmame: String
+ locationCityr String
+ locationState: String
+ locationCountry: String
+ conferenceDate: String




Crop
+ commonName: String
+ cultivars: Cultivar[ ]
+ studiedlnExperiments: Experiment[ ]
+ researcherComments: Document[]




Cultivar

+ name: String
+ description: String
+ ofCropType: Crop
+ researcherComments: Document[]




Date

+ year: String
+ month: String
+ day String
























































Figure B-2 Continued


Document

+ title: String
+ documentBody: String
+ dateCreated: Date
+ authors: Researcher[ ]


Experiment
+ identificationCode: String
+ description: String
+ programlnvolved: String
+ primarylnvestigators: Researcher[]
+ experimentalProtocols: ExperimentalProtocol[ ]
+ startDate: Date
+ endDate: Date
+ conductedAt: ResearchFacility[ ]
+ cropTypesStudied: Crop[ ]
+ cultivarsStudied: Cultivar[ ]
+ plants: Plant[ ]
+ chambersUsed: Chamber[]
+ environmentSetpoints: Setpoint[ ]
+ nutrientSolutionsUsed: NutrientSolution[ ]
+ solutionSetpoints: Setpoint[ ]
+ growthChamberStudyTypes: GrowthChamberStudyType[ ]
+ researcherComments: Document[ ]
+ isDataValidated: String


ExperimentalProtocol

+ experiment: Experiment[ ]
+ objective: String




















GrowthChamberStudyType

+ description: String




JournalArticle

+ journalName: String
+ volumeNumber: String
+ pagelNumbers: String




NutrientSolution

+ description: String
+ solutionlDNumber: String
+ tanksUsedln: Tank[ ]


Observation
+ date: Date
+ time: Time
+ observedValue: String



Figure B-2 Continued


GraduateResearch

+ degreeType: String
+ universityDepartment: String
+ universitylame: String

























































Figure B-2 Continued


Parameter

+ description: String
+ abbreviation: String
+ measurementUnitType: String


Plant

+ experimentUsedln: Experiment
+ trayLocatedln: Tray
+ researcherPlantComments: Document[ ]
+ cultivarType: Cultivar
+ plantlDlumber: Integer
+ harvestMeasurements: Observation[]


Researcher

+ lastName: String
+ firstName: String
+ department: String
+ emailAddress: String
+ workPhone: String
+ experimentslnvolvedWith: Experiment[ ]
+ publications: Document[ ]


ResearchFacility

+ description: String
+ location: String
+ researcherslnHouse: Researcher[]
+ growthChambers: Chamber[ ]

























































Figure B-2 Continued


RootObject

+ images: ALSImage[ ]

+ boolean searchFields(String args)


Sensor

+ description: String
+ serialNumber: String
+ parametersObserved: Parameter[]
+ chamberLocatedin: Chamber[]
+ tankLocatedin: Tank[ ]
+ currentlylnUse: String
+ firstActivationDate: Date
+ researcherComments: Document[]


Setpoint
+ startDate: Date
+ startTime: Time
+ endDate: Date
+ endTime: Time
+ setpointValue: String


Tank

+ description: String
+ nutrientSolution: NutrientSolution
+ trays: Tray[ ]
+ chamberLocatedin: Chamber
+ sensors: Sensor[ ]


















Time

+ hour: Integer
+ minute: Integer
+ second: Integer


Tray

+ description: String
+ trayPosition: String
+ tankLocatedin: Tank
+ plantslnTray: Plant[]



Figure B-2 Continued


TechnicalMemorandum

+ nasaTechMemollumber: String
+ securityClassification: String














APPENDIX C
ALLSTAR USER EVALUATION

Below is the complete (albeit informal) user evaluation survey given to 22

members of the ALS research staff at the Kennedy Space Center. Although the questions

attempt to maintain objectivity of scope, they are by no means considered an exhaustive

treatment of all issues raised by ALS staff usage of the ALLSTAR database application.

Of particular interest to this project were replies containing information related to the

typical flow of existing ALS research methods and initial perceptions of the new

ALLSTAR-based interface and tools. Responses to the evaluation survey were collected

through the ALLSTAR application itself by means of a specialized text-entry panel. The

survey requested anonymity of responses, as part of an effort to encourage frank

comments. This evaluation was not intended to provide an in-depth picture of ALLSTAR

reception and determination of success; as such, no statistical evaluation here applies.

Likewise, evaluation responses themselves have been withheld due to the candidness

afforded by their anonymous submission.



Only brief answers are requested to the seven (7) questions below; however, if time
allows, feel free to elaborate on your comments.

1. Do you feel that your pace of ALS-related research is in any way hindered by
limitations stemming from the current mode of access to ALS experimental data
(UNDACE, Oracle, Microsoft Access, etc.)?

2. If you answered 'yes' to #1, please try to briefly explain why. For example, issues
regarding data set size, network server bandwidth, database model/schema, parametric
dimensionality of sensor readings, or software tools used to retrieve data records.









3. Which software programs other than UNDACE/ALSDMG do you typically use in
analyzing the results of an ALS experiment, either for mathematical/statistical analysis or
other computations?

4. From your brief introduction to the ALLSTAR application, do you see any merit in the
continued storage of ALS-related research in an online-accessible, object-oriented
environment such as this one?

5. Please briefly explain your reasoning behind your response to #4.

6. One of the intended goals of ALLSTAR is to provide a sound framework for future
software-related development that could one day allow computers to take greater control
over scientific research operations; for instance, suggest appropriate experiments or
deduce unintuitive (to a human) correlations between seemingly unrelated research
environment components. In your opinion, might statistical analysis of this kind be able
to assist the immediate goals of ALS program research?

7. Another long-term goal of ALLSTAR is to provide the basis of an Internet-accessible
repository of scientific research accessible by both humans and machines for highly
efficient knowledge representation and logical deduction purposes. What concerns, if
any, do you have regarding the need for journal publication of results versus straight-to-
Intemet 'publication' of findings?
















LIST OF REFERENCES


17 US Code. Sec. 102. 2002.

Beck, H., and H. S. Pinto. 2002. Overview of approach, methodologies, standards, and
tools for ontologies. In Proc. Third Agricultural Ontology Workshop. Food and
Agricultural Organization of the United Nations.

BioBLAST. 2002. BioBLAST: Better Learning Through Adventure, Simulation and
Telecommunications. Wheeling, W.V.: Classroom of the Future.

Campbell, W. J. 1987. The development of a prototype intelligent user interface
subsystem for NASA's scientific database systems, NASA Technical Memorandum
87821.

Chaudhri, A. B., and M. E. S. Loomis. 1998. Object Databases in Practice. Upper Saddle
River, N.J.: Prentice-Hall, Inc.

Chen, P. 1976. The entity-relationship model toward a unified view of data. ACM
Transactions on Database Systems 1(1): 9-36.

Clark, K. G. 2003. Creative Comments: On the Uses and Abuses of Markup. XML.com.
Available at: www.xml.com/pub/a/2003/01/15/creative.html. Accessed 21 Feb
2003.

Codd, E. F. 1970. A relational model for large shared databanks. Communications of the
ACM 13(6): 377-387.

DB2. 2003. DB2 Universal Database. Ver. 8.1. White Plains, N.Y.: IBM Corp.

DeCoste, D. 2001. Visualizing massive multivariate time-series data. In Information
Visualization in Data Mining and Knowledge Discovery. U. Fayyad, G. Grinstein,
and A. Wierse, eds. San Francisco: Morgan Kaufmann Publishers.

Euzenat, J. 2002. Eight questions about semantic web annotations.IEEE Intelligent
Systems 17(2): 55-62.

Excel. 2002. Excel 2002. Redmond, Wash.: Microsoft Corp.

Ford, R. 1998. Our Moments Have All Been Seized. New York Times, 27 Dec., sec. 4: 9.

Gomez-Perez, A., and O. Corcho. 2002. Ontology languages for the semantic web. IEEE
Intelligent Systems 17(1): 54-60.






69


Heflin, J., and J. Hendler. 2001. A portrait of the semantic web in action. IEEE Intelligent
Systems 16(2):54-59.

Hoare, C. A. R. 1962. Quicksort. The Computer Journal 5(1): 10-15.

Institute of Food and Agricultural Sciences (IFAS) Information Technologies, University
of Florida. 2000. Web Taxonomy. Available at: orb.ifas.ufl.edu. Accessed 24 Apr
2003.

Java. 2003. Java Platform 2. Ver. 1.4.1. Santa Clara, Calif.: Sun Microsystems, Inc.

JFreeChart. 2003. JFreeChart. Ver. 0.9.8. Hertfordshire, U.K.: The Object Refinery.

Kogut, P., S. Cranefield, L. Hart, M. Dutra, K. Baclawski, M. Kokar, and J. Smith. 2002.
UML for ontology development. The Knowledge Engineering Review 17(1): 61-64.

Lassila, O. 1998. Web metadata: a matter of semantics. IEEE Internet Computing 2(4):
30-37.

Lee, G. 1993. Object-Oriented GUIApplication Development. Eaglewood Cliffs, N.J.:
Prentice Hall.

Li, W. 2002. Intelligent information agent with ontology on the semantic web. In Proc.
World Congress on Intelligent Control andAutomation 2: 1501-1504.

Loomis, M. E. S. 1995. Object Databases: The Essentials. Boston, Mass.: Addison-
Wesley Longman Publishing Co., Inc.

Lopatenko A. 2001. Information retrieval in current research information systems.
Position paper, Workshop on Knowledge Markup and Semantic Annotation at K-
CAP 2001.

MATLAB. 2002. MATLAB. Ver. 6.5. Natick, Mass.: The Mathworks, Inc.

National Science Digital Library. 2003. Available at: www.nsdl.org. Accessed 19 Apr
2003.

O'Keefe, S. 2002. "Pioneering the Future." Syracuse University. Syracuse, N.Y. 12 Apr.

Object Management Group. 2003. OMG-UnifiedModeling Language, vl.5. Needham,
Mass.: OMG Inc.

ObjectStore PSE Pro. 1999. ObjectStore PSEPro. Ver. 6. Bedford, Mass.: Progress
Software.

OIL. 2000. Ontology Inference Language. On-To-Knowledge.

Open Archives Initiative. 2003. Available at: www.openarchives.org. Accessed 19 Apr
2003.









Oracle Database. 2001. Oracle9i Database. Redwood Shores, Calif.: Oracle Corp.

Public Library of Science. 2003. Available at: www.publiclibraryofscience.org. Accessed
19 Apr 2003.

Resource Description Framework. 2001. World Wide Web Consortium. Available at:
www.w3.org/rdf Accessed 9 Jan 2003.

Semantic Web. 2001. World Wide Web Consortium. Available at: www.w3.org/2001/sw.
Accessed 9 Jan 2003.

Simon, H. 1971. Designing organizations for an information rich world. In Computers,
Communications and the Public Interest, 37-72. M. Greenberger, ed. Baltimore:
Johns Hopkins Press.

SPSS. 2002. SPSS. Ver. 11.5.Chicago, Ill.: SPSS, Inc.

SQL Server. 2000. SQL Server 2000 Enterprise Edition. Redmond, Wash.: Microsoft
Corp.

Strayer, R. F., B. W. Finger, M. P. Alazraki, K. Cook, and J. L. Garland. 2002. Recovery
of resources for advanced life support space applications: effect of retention time
on biodegradation of two crop residues in a fed-batch, continuous stirred tank
reactor. Bioresource Technology 84: 119-127.

van Harmelen, F., P. F. Patel-Schneider, and I. Horrocks. 2001. Reference Description of
the DAML+OIL Ontology Markup Language. Available at:
www.daml.org/2000/12/reference.html. Accessed 9 Jan 2003.

Wheeler, R. M., C. L. Mackowiak, G. W. Stutte, J. C. Sager, N. C. Yorio, L. M. Ruffe, R.
E. Fortson, T.W. Dreschel, W. M. Knott, and K. A. Corey. 1996. NASA's biomass
production chamber: a testbed for bioregenerative life support studies. Advances in
Space Research 18: 215-224.

Wieland, P. O. 1994. Designing for human presence in space: an introduction to
environmental control and life support systems. NASA Reference Publication 1324.















BIOGRAPHICAL SKETCH

Christopher Davidson received a Bachelor of Science in Computer Engineering

from the University of Florida in 1998. His research interests include artificial

intelligence, computational game theory, and applying computerized tools of higher

reasoning to classical fields of science-including agriculture, biology and statistics.

In his spare time, he enjoys displaying his engineering might by dismantling

household gadgets-toasters, televisions, bicycles, you-name-it-and scratching his head

while he tries unsuccessfully to reconstruct them.