PAGE 56

46

Adding aresourcetoourRepositoryisathreestep process.Thispageprovidesyouwithaninterfaceto uploadthedocument(resource)toourrepository.Thisis Step1oftheentireprocess.Inthisstep,youwillupload theresourcetoourrepository.Clickingthe"Go toStepII"buttontakesyoutoStep2whereyou'llbe presentedwithaformtofilloutsomebook-keeping informationabouttheresourceyouareadding.InStep3, additionalinformationabouttheresourceneedstobe filled.

Pleasenotethatyourdocumentwillnotbe uploadedtoourrepositoryuntillallthreestepsare completed.

Step I:Uploadtheresourcetothe Repository Selectafileto upload:


PAGE 57

47

Select acategory:
<% While(NOTselectCategoryRSet.EOF) %> <% selectCategoryRSet.MoveNext() Wend If(selectCategoryRSet.CursorType>0)Then selectCategoryRSet.MoveFirst Else selectCategoryRSet.Requery EndIf %>

  

PAGE 58

48  
Citation
Design and Implementation of An Intranet Driven Radiology Knowledge Bank

Material Information

Title:
Design and Implementation of An Intranet Driven Radiology Knowledge Bank
Creator:
Nair, Prashant ( Author, Primary )
Copyright Date:
2008

Subjects

Subjects / Keywords:
Application service providers ( jstor )
Database design ( jstor )
Databases ( jstor )
Information resources ( jstor )
Information technology ( jstor )
Keyword searching ( jstor )
Keywords ( jstor )
SQL ( jstor )
Typographic fonts ( jstor )
Web servers ( jstor )

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Prashant Nair. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Embargo Date:
5/1/2005
Resource Identifier:
78140620 ( OCLC )

Downloads

This item is only available as the following downloads:


Full Text

PAGE 1

DESIGN AND IMPLEMENTATION OF AN INTRANET DRIVEN RADIOLOGY KNOWLEDGE BANK By PRASHANT NAIR A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2003

PAGE 2

Copyright 2003 by Prashant Nair

PAGE 3

To my wonderful parents, beloved Sangi and brother Sundaresan.

PAGE 4

iv ACKNOWLEDGMENTS I would like to thank Dr. Douglas Dankel for giving me the opportunity to work on this topic under his supervision and for his valuable guidance in shaping my thesis. I want to thank Dr. Chris Sistrom for spending a lot of time in providing excellent feedback and for striking the perfect balance between providing direction and encouraging independence. I would also like to thank Dr. Joachim Hammer for serving on my committee and for the knowledge he has imparted through his excellent courses in the area of databases. I would also like to thank Dr. Honeyman for giving me the opportunity to enter the field of information technology in the medical community. I am grateful to my parents and my sister for their constant support and encouragement in every decision I made in shaping my career. I learned from my father to observe mildness of temper and a cheerful disposition in all situations. I am forever indebted to Sangi for her undying faith and love, for reminding me of my priorities and keeping things in perspective. I thank her for teaching me that “Hard work never goes to waste.” I would not have made it this far without her showing me light in darkness. My deepest admiration and thanks go to my brother Sundaresan for teaching me to laugh and to make others laugh and for being a source of constant inspiration to me through all ups and downs. Above all, I thank God for listening to my prayers and guiding me in every step of life.

PAGE 5

v TABLE OF CONTENTS Page ACKNOWLEDGMENTS..................................................................................................iv LIST OF TABLES............................................................................................................vii LIST OF FIGURES..........................................................................................................viii ABSTRACT....................................................................................................................... ix CHAPTER 1 INTRODUCTION.......................................................................................................1 1.1 Motivation..............................................................................................................2 1.2 Design Approach....................................................................................................3 1.3 Challenges and Contributions................................................................................4 1.4 Document Organization.........................................................................................6 2 RELATED RESEARCH AND TECHNOLOGY........................................................7 2.1 Application-driven Repositories............................................................................7 2.2 Web-enabled Repositories.....................................................................................9 2.2.1 File Transfer Protocol Technology..............................................................9 2.2.2 HTTP-based Sites......................................................................................10 2.2.3 Repository Engines-driven Sites................................................................10 2.3 Technologies Used...............................................................................................12 2.3.1 Open Database Connectivity.....................................................................12 2.3.2 Microsoft® Internet Information Services..................................................14 2.3.3 Active Server Pages...................................................................................15 2.3.4 Structured Query Language.......................................................................15 2.4 Summary and What Is Next.................................................................................16 3 SYSTEM DESIGN AND FEATURES.....................................................................17 3.1. Database Architecture.........................................................................................17 3.1.1 baseTable...................................................................................................19 3.1.2 attributeTable.............................................................................................20 3.1.3 attributeDataTable.....................................................................................21 3.1.6 categoryTable.............................................................................................22 3.1.4 userInfoTable.............................................................................................22

PAGE 6

vi 3.1.5 permissionsTable.......................................................................................22 3.2 Registration and Login Pocess.............................................................................23 3.2.1 Registering with RKB................................................................................23 3.2.2 Login Page.................................................................................................24 3.2.2.1. Action 1, password check...............................................................24 3.2.2.2 Action 2, session creation................................................................24 3.3 Summary and What Is Next.................................................................................26 4 RKBÂ’S DESIGN AND A SAMPLE SESSION.........................................................27 4.1 Add a Resource....................................................................................................28 4.1.1 Step I, Submit the Resource.......................................................................30 4.1.2 Step II, Submit Book Keeping Information...............................................32 4.1.3 Step III, Submit Meta-data Information....................................................32 4.2 Search Resources.................................................................................................34 4.2.1 Basic Search...............................................................................................34 4.2.2 Advanced Search.......................................................................................35 4.3 Browse Resources................................................................................................38 4.4 Summary and What Is Next.................................................................................40 5 CONCLUSION AND FUTURE WORK..................................................................41 5.1 RKB and Its Contributions...................................................................................41 5.2 Limitations and Future Work...............................................................................42 5.2.1 Synonyms and Spelling Errors..................................................................42 5.2.2 Rank Based Searches.................................................................................43 CODE LISTING...............................................................................................................45 LIST OF REFERENCES..................................................................................................59 BIOGRAPHICAL SKETCH............................................................................................61

PAGE 7

vii LIST OF TABLES Table page 1 Two main categories of repositories..........................................................................7 2 Example set of tuples in the attributeTable for the “thesis” category.....................21 3 Example set of tuples in the attributeDataTable for the “thesis” category..............21 4 Automatic saving of documents based on their category name..............................30

PAGE 8

viii LIST OF FIGURES Figure page 1 High-level design.......................................................................................................5 2 ODBC components..................................................................................................14 3 Radiology knowledge bank database schema.........................................................18 4 Registration page.....................................................................................................23 5 Partial Global.ASA code.........................................................................................25 6 The Main.asp Page of RKB.....................................................................................27 7 The Code for secure access to ASP pages...............................................................28 8 Steps in adding a resource to the repository............................................................29 9 Step I – Screen shot of the Submit resource page...................................................30 10 Step II – Submit book keeping information............................................................32 11 Step III – Submit meta-data information.................................................................33 12 Basic search page.....................................................................................................34 13 SQL query executed for keyword-based search......................................................35 14 Advanced search page I...........................................................................................36 15 Advanced search page II..........................................................................................37 16 The SQL query generated from the advanced search shown in Figure 4-10..........38 17 Screen shot of the Browse resource page................................................................39 18 The list of resources in the Thesis category ...........................................................39

PAGE 9

ix Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science DESIGN AND IMPLEMENTATION OF AN INTRANET DRIVEN RADIOLOGY KNOWLEDGE BANK By Prashant Nair May 2003 Chair: Douglas D. Dankel II Major Department: Computer and Information Science and Engineering With the rapid growth of the Internet, the number of documents and resources available online has increased exponentially. Existing document organization and retrieval systems are not equipped with adequate management capabilities. This necessitates the development of a powerful and easy to mange tool that is more intuitive and imperative to understand. Physicians in the Department of Radiology at Shands Hospital, University of Florida, regularly consult teaching files, anatomic diagrams, pathological codes, ACR codes, normal measurement tables, and magnetic resonance imaging (MRI) images in analyzing patient anomalies and similar cases. Given the Department of RadiologyÂ’s rich collection of resources, it becomes cumbersome and time-consuming to locate the right information to complete some task. This thesis involves the development of a Knowledge Bank to centralize these otherwise scattered resources and present these data in a more structured manner. The repository,

PAGE 10

x called the RKB (Radiology Knowledge Bank), provides three basic functions: a dding a resource, searching for a resource, and browsing a resource . This thesis presents an overview of the system and the challenges involved in designing and implementing this flexible and scalable repository.

PAGE 11

1 CHAPTER 1 INTRODUCTION The World Wide Web has seen an exponential growth over the past few years in the number of documents that have been created, stored, and managed with these documents scattered in many different formats. Existing document filing and retrieval systems are either very rigid or they do not provide the right management capabilities. The significant barrier in using this information has changed from providing access to finding the right information in this huge distributed collection. We need a more powerful and easy-to-manage system that is more intuitive and imperative to understand. A knowledge repository (or Knowledge Bank) is a collection of data thinly structured by semantic relations. This repository allows the centralization of an otherwise scattered collection of information and presents these data in a more structured manner. It also provides a means for easy retrieval of these resources through queries or a hierarchical tree structured view. It is not necessary that all of the data are formally represented. The data may exist in the form of images or sentences in natural language. The knowledge repository takes these data, gives them a semantic meaning, and represents them in a more formal manner. In addition a well-implemented repository may check for completeness of these data, classify them, and extract relevant and precise parts to answer a query. It is important to understand that a repositoryÂ’s aim is supplying information to the user that he/she can interpret and apply it in problem solving. The processes that a repository must provide are adding resources,

PAGE 12

2 storing these resources, searching resources efficiently, acquiring (download) these resources. The key idea behind a repository is to provide a large space to store information and to provide mechanisms to search and make these resources available. The emerging field of "resource re-usability" has led to widespread use of repositories. Rapid retrieval of information and subsequent reuse of knowledge has resulted in a sharper focus on methods for representing and storing artifact knowledge [Szykman et al. 2000]. 1.1 Motivation The Department of Radiology at Shands Hospital, University of Florida, has a large number of doctors, attending physicians, nurses, residents, faculty members, and students in addition to the number of patients who come for diagnosis everyday. Case files are maintained for every patient in part because these case files help students and doctors better understand the patient’s pathological condition before they begin to actually diagnose the patient. A case file typically includes medical images from X-Rays, MRI scans, and other important descriptions of the anomaly. To make teaching effective, a complete database of cases illustrating each pathological condition is maintained in a Radiology Teaching File (i.e., the RTF). Cases are added on daily basis making the RTF a growing database. Teaching files are one of the many resources used in the Department of Radiology. Others include full-text journal articles, PowerPoint slides, anatomic diagrams, Normal Measurement Tables and Graphs, GIFs, JPEGs, Microsoft® Word documents, MRI scanned images, ACR codes, pathological codes, and references from the Internet. Proper

PAGE 13

3 retrieval and management of these resources using a repository become inevitable considering the large volume of data that are generated on a daily basis. If acquiring and managing these resources is perceived as cumbersome and unproductive, then the repository will not be used at all. So, in addition to the basic functionalities listed above, a repository must have other features to enhance its usefulness including A means to add, capture, and store precise information about these resources. These information units are called meta-data since they provide information about the document itself and the document contents. Sophisticated search mechanisms that allow queries to be easily formulated and which return correct results. A hierarchical categorization of the resources for easy perusal as an alternative to searching. This feature should provide the end-user with a "dynamic Table Of Contents" view. Up-to-date documentation about the resources that allows their semantics and quality to be evaluated. Our research concentrates on online/web-based repositories since they better suit our needs. 1.2 Design Approach The requirements listed above provided the structure for the development of the online repository. The heart of the design is a Meta-Data Bank and a Web Server . See Figure 1.1. The Web server receives requests from different client browsers. The Microsoft Internet Information Services (IIS) web server is capable of running suites of visual basic scripts called Active Server Pages (ASP). These scripts act like hidden-APIs to the client requests by interacting with the backend database and responding with the requested information. The clients are unaware of the underlying mechanisms involving how the script connects to the database and retrieves the requested information.

PAGE 14

4 The Meta-Data Bank (MDB) is an SQL Server database. This is called a Meta-Data DB since it stores information about the submitted resource and its contents. These data become advantageous when one needs to locate a specific resource. For example, if a user is searching for a particular resource, he/she can complete an HTML form on an Active Server Page with any information he/she has about the resource. When the web server receives this information from the client, it initiates an instance of the search script and executes it. This script, running on the web server, first establishes a connection with the SQL server using an ODBC (Open Database Connectivity) bridge. The information units contained within the script are compared against the information stored in the data bank. Links to the closest matched records are returned to download these resources. These resources are categorized and stored in subdirectories over several Redundant Arrays of Inexpensive Disks (RAID) disks on Windows® 2000 servers. The ASP script then formats the returned results into HTML (Hyper Text Markup Language) and sends the result to the client browser that initiated the request. The browser on the client’s machine renders the HTML source code into the proper format. 1.3 Challenges and Contributions The goal of this research is implementing a repository that can extract important information about the resources being added to it and then using this information as a searching or browsing resource. There are many challenges involved in building such a knowledge bank, including:

PAGE 15

5 Storage units (RAID disks, Dell servers) Client Browser 1 Client Browser N Client Browser 2 . . . . . . Microsoft IIS Web Server running ASP Scripts \ 001 \ 002 File1.ppt File2.GIF FileN.JP Meta Data Bank (SQL Server) ODBC Brid g e File1.doc File2.htm FileN.ppt \Knowled g eBank\U p loadedFile Figure 1-1. High-level design Resources are added to the repository in their raw ASCII or binary format with no embedded semantic information. The knowledge bank must extract this semantic information and store it in a format for future use (viz. searching and browsing for resources). The browse for resources facility generates a hierarchical tree of resources divided into appropriate subject/categories. The knowledge bank should generate this resource tree dynamically so that should a particular subject category be deleted or moved, it can intelligently detect this change and update its view dynamically. Our implementation accomplishes this and generates a dynamic Table of Contents View . Students, faculty, physicians, nurses, and doctors will access our knowledge bank. This requires different access levels for the different parts of the system/application. For example, students and residents should not be allowed to delete or modify any resources or categories. Our implementation follows the structure of the UNIX systemÂ’s group access levels. All members fall into a certain

PAGE 16

6 group and are given privileges according to the access levels specified for that group. The most important contribution of this thesis is a detailed description of the RKB architecture and design details. Our design has addressed and solved each challenges stated above. This thesis also highlights the main features of the RKB and proves that our design is flexible, scalable, and robust. 1.4 Document Organization The remainder of this thesis is organized as follows. Chapter 2 presents a summary of related research in the field of knowledge repositories. Chapter 3 provides an in-depth description of the design and architecture of our Radiology Knowledge Bank, the tools and programming language used, and a brief discussion of some of the important queries used. The RKB implementation details and several illustrative examples using snapshots are presented in Chapter 4. Conclusions and future work are presented in Chapter 6.

PAGE 17

7 CHAPTER 2 RELATED RESEARCH AND TECHNOLOGY Significant research has been performed in the area of repositories during the past few years. Based on the user interface, repositories can be classified broadly into 2 main categories: a pplication-driven repositories and web-based repositories , as shown in Table 2-1. While it is not possible to review all work performed in the area of applicationdriven repositories, we provide a brief summary of the important research done in this area. Since our knowledge bank belongs to the second category, we cover an in depth review of major research in this area with some examples. Please note that the terms resources and components will be used interchangeably through out the remainder of this chapter, as they are common terms used in this research area with the same meaning. Table 2-1. Two main categories of repositories 2.1 Application-driven Repositories Application-driven repositories execute as independent applications. The application code is written in a programming language like C++, java, or JFC/Swing, which requires the application to be installed on each machine on which it will be run. Application-driven Repository Web-based Repository Applications written in a programming language (C++, Java). Needs to be installed on all machines and are therefore platform dependent. Example: CASE tools. Users need a Web browser (Internet explorer, Netscape navigator) to access the repository by clicking links on the page and are thus platform independent. Example: FTP sites, HTTP based sites, Repository in a box (RIB), and MORE.

PAGE 18

8 Another limitation with this approach is that the code developed for the application are sometimes machine-architecture specific. Code written and run on a 32-bit Windows operating system may not necessarily run on a UNIX workstation and, hence, may require modifications. Computer-Aided Software (or System) Engineering (CASE) technology involves using computer-based information systems to assist in elicitation, storage, management, and analysis of such data [Chen & Sibley 1991]. It is important to note that CASE is a technology and not the application itself. It provides tools for designing repositories. Most CASE tools support various structured techniques, such as data flow diagrams for structured analysis and structured charts for structured designs. First generation CASE tools (e.g., PSL/PSA [Teichroew & Hershey 1977]) in the early 1970s were mainframe based, employing non-graphical interfaces. These systems failed to adequately capture and display the needed information. In the early 1980s CASE tools emerged that provided better analysis functions to enforce the principles and guidelines of the structured techniques supported by the tools. Information captured in graphics was stored in a project dictionary that could be shared with other CASE tools from the same vendors. Excelerator from Index technology [Index Technology 1989] is an example of such a CASE product. Some of the important functions performed by CASE tools are: Date integrity – validate entries to the repositories and ensure consistency among related objects. Information sharing – provides mechanisms for sharing information among multiple developers and vendors.

PAGE 19

9 In addition, CASE tools provide a semantically-rich tool interface. The meta-model contains semantics that enable a variety of tools to interpret the meaning of the data stored in the repository. CASE tools also provide project management tools that register information about software applications, the characteristics of each project, and the organizationÂ’s general process of software development phases, tasks, and deliverables. 2.2 Web-enabled Repositories Web-based repositories provide HTTP-based access to the repository. The user clicks on defined links to access various features of the repository. Typically, an associated database stores the resources and information about these resources. The HTML pages provide statically or dynamically generated links to access resources in the database on an ODBC bridge or some other connection. This approach has two significant advantages over the previous approach. Firstly, web-based repositories are not limited by varying operating systems and computer architectures. Unlike applicationdriven repositories in which the application needs to be installed on each computer, webbased repositories can be accessed from any computer with a web-browser. Web-based repositories also provide enhanced security features like group based access to confidential resources. In the following subsections we examine three major families of web-based repositories: FTP technology , HTTP-based sites, and Repository engines-driven sites . We shall learn about each of these approaches and compare and contrast them to each other. 2.2.1 File Transfer Protocol Technology The File Transfer Protocol was created to ease the transfer of files from one host to another by providing space for storing and accessing files. Although FTP is a primitive repository technology, we include it in our study since it formed the foundation for

PAGE 20

10 intelligent repositories of the modern age. The underlying file system, directories, and sub-directories provide the backbone of the FTP technology with FTP providing the interface to acquire and submit resources. Generally, a README text file describing the structure is also available to support searching or browsing resources. FTP technology itself does not provide any mechanisms to search components in the file system or the ability to extract and attach any semantic meaning to each component. It does not provide any alternative category view. Without performing an exhaustive manual search by downloading every component in a category and examining its documentation, it is virtually impossible to find the right component (i.e., it is left to the userÂ’s interpretation of the category name and the contents of category to find the right resource). 2.2.2 HTTP-based Sites Many websites provide shared use of software and other resources. Typically, these sites provide component descriptions of the resources in the form of meta-data, which the users can browse or search. One major drawback with such sites is that the categories are pre-defined, not allowing the option of the users adding new categories or modifying any attributes. As a result, these repositories are typically small, manually managed hobby sites. 2.2.3 Repository Engines-driven Sites Repository engine-driven sites use a repository engine, implemented using a set of tools, to provide web-based repository support. Repository in a box (RIB) and Multimedia Object Repository Environment (MORE) [Eichmann et al. 1994] are the two widely used tools in this family.

PAGE 21

11 RIB, implemented as a suite of PERL scripts running against a web server using CGI, is a freely available repository developed at the University of Tennessee. RIB provides all the important features expected of a repository including defining new domains for resources and keyword based searching against component descriptions. RIB also provides the ability to represent the contents of a remote repository as though it were local. RIB stores meta-data about the component in the repository and allows users to browse and search the domain hierarchy by entering keywords. The search results include a brief description and a location to download the component. RIB however, does not support secure access to web pages having important components viewable only by a particular subset of users. RIB does not provide any mechanisms to define groups of users with certain privileges. For example, students can view and download all resources that are available to faculty members and vice versa. All resources are available to all members registered with RIB. MORE is a web-based system developed at the University of Houston as a part of the Repository Based Software Engineering program. Like RIB, MORE stores meta-data about components and provides browsing, categorization, description, and searching functions. MORE also allows the definition of groups and users, searching based on synonyms, and pattern-based searching. MORE uses the notion of a “class,” where components are placed into different classes. This facilitates the idea of maintaining different attributes for different components. The basic design goals of the MORE system are: Providing mechanisms to define group level access to proprietary sub-collection of components. Optimizing the storage of meta-data.

PAGE 22

12 Providing a Web-enabled interface to all users. MORE uses Mosaic as the primary web-browser and encourages its use to all members. Several versions of MORE were developed including MORE2.0 and MOREplus. The main extensions in these newer versions allow administrators to define new classes and make changes to the attributes of classes. Also, users can search within certain classes using enhanced search forms. 2.3 Technologies Used This section describes the various technologies used in implementing the RKB. Microsoft® Access forms the main database that stores meta-data information. An Open Database Connectivity (ODBC) bridge connects the database and the Microsoft® Internet Information Services (IIS) web server. The web server drives the main web pages using Active Server Pages (ASP) technology. Search forms on the web pages utilize Structured Query Language (SQL) to query databases and retrieve the requested information. The following subsections provide a brief summary of each of the terms used above. 2.3.1 Open Database Connectivity The ASP scripts interact with the databases over an ODBC connection, which is used to access relational or indexed sequential access (ISAM) method databases [Microsoft ODBC 2001]. ODBC allows access to a wide range of relational databases and local ISAM data. SQL statements written in the scripts can be executed on the database over this ODBC connection. Values from the database can be placed in program variables and vice versa. ODBC defines a call level interface (CLI), which is defined as a set of low-level function calls that allow client applications and server applications to exchange instructions and share data. The client and server applications may or may not reside on

PAGE 23

13 the same machine. CLI uses the native programming language to call functions and so a CLI does not need any extension to the underlying programming language. The ODBC interface allows applications to access data from database management systems (DBMS). A single application can access diverse back-end database management systems permitting maximum interoperability. An application developer can develop, compile, and ship an application without targeting a specific DBMS product. Modules called drivers can be added later on. The ODBC interface thus defines the following: Libraries of ODBC function calls that allow an application to connect to a DBMS, execute SQL statements, and retrieve results. A standard way to connect and log on the DBMS. A standardized representation for all data types. Figure 2-1 illustrates the architecture with the ODBC components discussed in this chapter. The ASP pages that provide the interface to the repository form the top application layer. The rest of the components in the figure are explained as below. The ODBC architecture has four components: Application – Performs processing by passing SQL statements to and receiving results from the ODBC driver manager. Driver manager – a dynamic link library (DLL) that loads drivers on behalf of an application. Driver – a dynamic link library that processes ODBC function calls received from the driver manager. The driver may modify an application’s request so that the request conforms to syntax supported by the associated DBMS. Data source – Consists of a DBMS, the operating system the DBMS runs on and the network used (if any) to access the DBMS.

PAGE 24

14 Application Driver Manager Driver Driver Driver Data Source Data Source Data Source Figure 2-1. ODBC components 2.3.2 Microsoft® Internet Information Services The Windows 2000 server operating system comes with a web server called the Internet Information Servers (IIS 5.0) that enhances reliability, performance, management, security, and application services. IIS helps deploying web browsers and other business applications, host and manage web sites, and share information securely across an Internet or Intranet [Microsoft IIS 2002]. IIS helps perform the following: Generate dynamic web pages by using Active Server Page technology. Customize error messages and content expiration on web pages. Capture user and system information in log files for future analysis. In addition, IIS also provides the following management features: Wizards to help in creating custom web sites, folder, sub folders, and other administrative tasks. Utility functions to set independent properties on each web page or a group of web pages in a folder. Built-in administrative scripts to automate common administrative tasks across multiple servers.

PAGE 25

15 2.3.3 Active Server Pages Generating and displaying dynamic data, especially when databases are involved in the background, cannot be performed by static HTML pages. Instead, Active Server Pages (ASP), a server-side scripting technology used to create interactive and dynamic web pages [Microsoft ASP 2002], is used. ASP can include HTML data as well as calls to COM components to perform a variety of tasks including connecting to a database, processing, and displaying dynamic data. When a browser requests an ASP file, the web server calls the server side scripts. It executes any script commands and sends the data back to the web browser. It is the duty of the web server to perform all the work involved in generating the HTML pages since the scripts run on the web server. Another important advantage of using server-side technology is preventing plagiarism and data stealing. Server-side scripts cannot be readily copied because users cannot see the code involved. Finally, ASP can handle large volumes of users concurrently and consistently. 2.3.4 Structured Query Language All Database Management Systems (DBMS) provide a set of commands to work with the data stored in databases. The Structured Query Language (SQL) [Elmasari & Navathe 2001] is the most common and widely accepted querying language. The American National Standards Institute (ANSI) and the International Standards Organization (ISO) frame software standards (including standards for SQL). SQL defines the methods used to create and manipulate relational databases on all major platforms. SQL commands can be divided into two main sub languages: – the data definition language (DDL) and the data manipulation language (DML). DDL is used to create and destroy databases and database objects. Database administrators primarily use these commands during the setup and removal phases of a database project. CREATE, USE,

PAGE 26

16 ALTER, and DROP are the important SQL commands that fall under this category. After the database structure is defined with DDL, users can utilize DML to insert, retrieve, and modify database information. Some of the frequently used commands are INSERT, UPDATE, DELETE, and SELECT. 2.4 Summary and What Is Next In this chapter we examined the types of repositories: application-driven and webbased repositories. We also discussed various technologies like ODBC and its components, Microsoft IIS, ASP, and SQL. In the next chapter we present a walk through of the web interface of RKB and discuss the underlying implementation details.

PAGE 27

17 CHAPTER 3 SYSTEM DESIGN AND FEATURES Chapter 1 presented a basic overview and design of the RKB. This chapter looks closer into the implementation details of the RKB, mostly from the perspective of a user. We examine the database design, which forms the backbone of the RKB. This is followed by an in-depth explanation of the features of RKB and a step-wise walkthrough all these features. 3.1. Database Architecture A strong and sound database design provides the strength and success of any application. Needless to say, the database design formed the first phase of our implementation. In the subsequent sections we examine why we chose this database design and explain the important tables in the schema. Requirement and analysis is an important stage at the beginning of any project. After consulting with students, faculty, physicians, and doctors in the Radiology department, the following requirements were identified: An easy to use and understand interface accessible from anywhere in the department. An ability to add, search, and browse resources. Different access privileges for students, nurses, doctors, and faculty members. The ability to categorize resources forming a hierarchical tree structure. The ability to add, modify, and delete categories and attributes from the database design. Given the above points, we decided to implement a web-based repository accessible on the Intranet in the department. The knowledge bank has three main

PAGE 28

18 features: – add a resource, search resources, and browse for resources. Figure 3-1 illustrates the schema design of our database and the relationships among tables. Figure 3-1. Radiology knowledge bank database schema For the discussion below, please note that the columns in the tables are denoted in the following format: columnName

PAGE 29

19 where columnName indicates the name of the column the table and fieldType indicates the type of value for that column (e.g., string, integer, date, etc.). 3.1.1 baseTable The baseTable is the main table that holds all bookkeeping information about all of the resources added to our knowledge bank. One tuple exists for every resource added. This table includes the following columns: docId: This is a unique id assigned to every resource in the knowledge bank. In database terms this is the primary key of this table. The field type, autonumber, is provided by the Microsoft® Access database. This means that a unique id is assigned to a new tuple automatically. catName: This is the name of the category or class into which a particular resource falls. A huge collection of category is already defined in our RKB. If a user cannot find the right category for a particular resource, he/she can add the resource to the “Other” category. Details of this option follow. All category information is stored in another table. Title: This field indicates the source of the resource. Some source fields are web resources, teaching files, etc. openToPublic: The RKB allows users to either hide the resource being added from the public or make it available to everybody. For a yes, a value of 1 is assigned indicating that the resource is not displayed while browsing or searching, while a 0 indicates a no signifying to display the resource. linkToDest: This field holds the destination path to where the resource is stored on the server (e.g., “C:\knowledegBank\uploadedFiles\file231.jpeg”). An important point to note here is that all resources are added into a sub-folder indicating that resource’s category. This is explained further in the next chapter. contributorId: This indicates the name of the user who contributed the resource to the knowledge bank. groupNum: All users fall into a certain group based on their access levels. Students and end-users are given the least privileges and have a groupNum equal to 7. Administrators are at the highest level, having group number of 1. This strategy is explained in detail in section 3.1.5. visibleDate: This indicates the date on which the added resource becomes visible to all members of the RKB. All dates are entered in MM/DD/YYYY format.

PAGE 30

20 This is an optional field. If this field is left empty, the current date is added automatically indicating that the resource becomes visible immediately. expiryDate: The date on which the particular resource goes offline. If this field is left empty, a value equal to 365 days is added to the current date and is entered automatically indicating that the resource will go offline after a year. deleteDate: This field indicates the date on which the resource is permanently deleted from the knowledge bank. This field is introduced to prevent users from accessing out dated resources and to keep the knowledge bank from growing out of proportion. As with the above field, a value equal to 365 days added to the addedDate is entered if left empty. addedDate: This indicates the date on which the resource was added to the repository. If no value is entered, the current date is entered automatically. 3.1.2 attributeTable Meta-data from the resources is extracted by asking the user submitting the resource to complete a few HTML forms and collecting data from the forms into attributes. We maintain an attributeTable for this purpose. This table contains the following fields: catId: This is an id for the category into which the particular resource falls. attributeId: This is the primary key for this table. attributeName: This is the name of the attribute for this category. entriesAllowed: This indicates the number of entries allowed for a particular attribute. For example, a dissertation can have only one Title name; however, any number of keywords can be entered to identify the dissertation resource during keyword-based searches. Title name and keywords are attribute names while dissertation is the category. Type: This field records the type of each attributeName field. Examples include Title is of type string, Publication year is of type Date, and so on. Constraints: This is a descriptive field identifying any constraints that exist on any of the attributeName fields. For example, Publication Year has to be greater than 1970 suggesting the policy of not adding any journal paper older than 1970.

PAGE 31

21 Length: This field indicates the allowable length of attributes. For example, ACR codes can be only 3-digit fields. N indicates any length. Table 3-1 illustrates a set of tuples in the attributeTable for the Thesis category. Table 3-1. Example set of tuples in the attributeTable for the “thesis” category. catId attributeId attributeName entriesAllowed Type Constraints Length 003 1 Title of Thesis 1 Text None N 003 2 Author(s) 3 Text None N 003 3 Institution 3 Text None N 003 4 Department 3 Text None N 003 5 Publication Year 1 Date >= 1970 N 003 6 Keywords 7 Text None N 3.1.3 attributeDataTable Meta-data are stored in the attributeDataTable. The docId field in the attributeDataTable acts as a foreign key to the docId field in the baseTable. For each docId value in the baseTable, there may exist up to N docIds in the attributeDataTable thus creating a one-to-many relationship. ID: This field is the primary key of this table. docId: This is the id for the resource added. attributeName: This field indicates the name of the attribute. This field always matches the values for attribute names for this category in the attributeTable. Data: This field stores the meta-data for each attribute name in the category for that resource. Table 3-2 illustrates a typical entry for a thesis resource. Table 3-2. Example set of tuples in the attributeDataTable for the “thesis” category. ID docId attributeName Data 96 6 Title of Thesis Implementation of a repository 97 6 Author(s) Dr. Douglas Dankel, Prashant Nair 98 6 Institution University of Florida 99 6 Department Computer &Information Science & Engineering. 100 6 Publication Year 05/05/2003 101 6 Keywords Knowledge representation, repository, Semantic

PAGE 32

22 3.1.6 categoryTable Any resources added to the repository fall into a category. If a resource cannot be added to any of the categories in the knowledge bank, a special category called “Other” is used. Users add the resource to this “Other” category with a description of the category name he/she thinks is appropriate. This also requires the user to provide appropriate attribute names for that category. Administrators can add that category later after analysis. 3.1.4 userInfoTable This table stores user-related information. Users are students, doctors, nurses, and others in the department who can add, search, and browse resources. This table contains information regarding the name of the user, their username, their password, their email address, the organization to which they belong, and the type of permission they have to access the repository. userId is the primary key and is linked to the baseTable as a foreign key forming a one-to-many relationship. 3.1.5 permissionsTable To enforce secure access to the repository, a UNIX based group level access system has been embedded into all pages. This access level system, with 1 indicating the highest privilege and 7 indicating the least privilege, assigns privileges as follows: 1 = Admin, 2 = Attending doctors, 3 = Fellows, 4 = Residents, 5 = Medical Students, 6 = Secretaries, and 7 = End users. This type of group level access ensures that resources in certain categories are accessible to only a set of users. For example, nurses and students may not access any teaching files. Nurses could be added to group number 5.

PAGE 33

23 3.2 Registration and Login Pocess To become a member of the RKB, a person must register and create an account. Once a username is created, the member can use that id and password to login to the RKB. In the following sections we examine the important steps in this process. 3.2.1 Registering with RKB First time users become a member of the RKB by using the Registration page. Figure 3-2 shows the registration page. Figure 3-2. Registration page

PAGE 34

24 To register, users complete the required information including their first and last name, their organization’s name, their email address, their preferred username, and a password. All entered data are collected in temporary ASP variables. To ensure that all users have unique usernames, the value entered for the username is compared with all username records in the userInfoTable. If a match is found, the user is asked to select another username. Once a unique user id is entered, an INSERT operation is performed and this user’s data are added to the userInfoTable. 3.2.2 Login Page Once an account is created, a user can log in using their username and password. Two actions are performed before access is given to the main page. 3.2.2.1. Action 1, password check The password entered is matched against the registered password in the database. Users are allowed to enter the main page only after their password is successfully matched. This action requires a query into the database to retrieve the registered password and compare it against the entered value. 3.2.2.2 Action 2, session creation The session creation is the critical step in the entire logging process. All ASP pages hosted on a Microsoft® IIS web server refer to a “Global.ASA” file when either the IIS program starts and stops or when a web client starts and stops a browser sessions accessing the program’s web pages. The Global.ASA file contains the following scripts to perform tasks such as initializing application variables, connecting to databases and sending cookies:

PAGE 35

25 Application_OnStart(): This function is called before any .asp files are processed – before any text or graphics are rendered and sent to the user’s browser. Application_OnEnd(): This function is called when the application ends, which occurs when IIS shuts down. A call to this function removes the application objects from memory. Session_OnStart(): This function is called by active server pages the first time a user requests for this application arrives from a browser. Session_OnEnd(): This function cleans up objects when the user session ends, whether it was a time-out, a user exit, or an abandonment of the session by the active server page. Partial code of the Global.ASA file in our implementation is shown in Figure 3-3. For our application, it is important to maintain a variable that tracks if the user is valid for that session. This is done through a session variable called “IsUserGood,” which is set to false in the Global.ASA’s Session_OnStart() function. '****************************************************' 'Filename:Global.ASA 'Author:PrashantNair 'Copyright:DepartmentofRadiology,ShandsHospital '****************************************************' SubSession_OnStart() Session("IsUserGood")=False EndSub SubSession_End() Session("IsUserGood")=False Response.Cookies("currUser").Expires=DateAdd("d",0,Date) Session.Content.RemoveAll() Session.Abandon() EndSub Figure 3-3. Partial Global.ASA code When the user successfully logs in (after password authentication), IsUserGood is set to true in subsequent active server pages until the user logs off.

PAGE 36

26 When the user logs off, the Session_OnEnd() function is called where IsUserGood is set to false and clean-up process begins. The clean-up process involves clearing all temporary objects created by Microsoft® IIS or the active server pages and deleting all cookies. 3.3 Summary and What Is Next We first examined the database structure of our RKB and explained the important fields in some of the tables in the schema. Next, we examined the registration and login processes. At this point, session variables and cookies have been created to track the activities of the logged in user. The next chapter examines get into the implementation details of the three basic features of the RKB.

PAGE 37

27 CHAPTER 4 RKB’S DESIGN AND A SAMPLE SESSION In Chapter 3 we examined the log in process, which provides the user access to the main facilities of the RKB. Figure 4-1 shows Main.asp, which provides links to the three basic facilities: add a resource, search for a resource, and browse resources. This figure also shows at the bottom the list of resources added by that user. This list is generated by a SELECT query into the database. Figure 4-1. The Main.asp Page of RKB As discussed in the previous chapter, a session variable “IsUserGood” is created to keep track the validity of the user. If the user remains idle for more than 20 minutes, the

PAGE 38

28 user’s session ends automatically and he/she is logged off. The identity of the user, the value of the user’s user id, is preserved in a cookie variable named “currUser.” A cookie is a small file that a server embeds into the user’s computer. Each time the computer requests an ASP page it sends the cookie as part of the request. <%LANGUAGE="VBSCRIPT"> <% IfRequest.Cookies("currUser")=""Then Response.Redirect"breach.htm" EndIf IfSession("IsUserGood")=FalseThen Response.Redirect"breach.htm" EndIf %> Figure 4-2. The Code for secure access to ASP pages Figure 4-2 shows the initial set of code that is executed when the user enters Main.asp page. If the cookie is empty or if the session is invalid, the user is redirected to breach.htm which indicates that a violation has occurred and that he/she has been logged off. All pages that need to be secure use this code. The following sections provide an in-depth discussion of the implementation of the RKB’s three basic features: adding a resource, searching for a resource, and browsing resources. 4.1 Add a Resource Adding a resource to the repository is a three-step process. See Figure 4-3. In Step 1 in the figure, performed by addPage1.asp, collects the resource to be added from the user. This could be a document (e.g., a teaching file), an ACR code, a pathological code,

PAGE 39

29 addPage1.asp Submit resource addPage2.asp Book-keeping information addPage3.asp resource-specific information resource already exists? N Y I II III Abandon.asp Resource not inserted. Redirect to Main.asp Insert Record * indicates that the user abandoned the p rocess * * Figure 4-3. Steps in adding a resource to the repository a normal measurement table, etc. If the resource submitted in this step already exits, then the user is taken back to addPage1.asp to add another resource. In step II, the user completes some bookkeeping information on addPage2.asp. In step III, the user completes an additional set of information specific to the resource on addPage3.asp. After Step III, the resource is permanently added to the repository. At any intermediate stage if the user abandons the process, the resource is not added to the repository and he/she is redirected to Main.asp. The database is also cleaned to prevent any dangling information. In the following sections, assume that user pnair is adding a resource of type Thesis into the repository.

PAGE 40

30 4.1.1 Step I, Submit the Resource In Step I the user submits the resource to the repository. Figure 4-4 shows addPage1.asp. Figure 4-4. Step I – Screen shot of the Submit resource page On this page the user, using the Browse button, locates the resource to be added from his/her local drive and selects the category into which the resource falls (in our example this is Thesis). This information is submitted by clicking the “Go to Step II” button. It is important to understand how the repository is categorized based on the category id. Table 4-1 identifies where the resources are saved based on the category id. Table 4-1. Automatic saving of documents based on their category name Category Name Category Id Saved to location Lectures 001 C:\KnowledgeBank\UploadedFiles\001 Journal/Serial Article 002 C:\KnowledgeBank\UploadedFiles\002 Lectures 003 C:\KnowledgeBank\UploadedFiles\003

PAGE 41

31 The ASP script on the addPage1.asp categorizes the resource as shown in Table 4-1 above. If a folder for that category does not exist, a folder is created automatically and the resource is added. It is important to discuss a key ASP feature at this point. Refer to Figure 4-4. Note that the tag “contributorId=pnair&groupNum=7” is passed from the Main.asp page to the addPage1.asp page through the web page address. This tag consists of {name=value} pairs, a way that HTML passes data to a web server over the Internet. The name portion of the pair is the identifier and the value part is the quantity or text that is sent. Name=Value pairs are an easy and reliable way to pass data and are widely used within database systems and HTML forms. If multiple name=value pairs are to be sent, they are separated by a “&.” To inform the web server that a name=value pair is coming, they are separated from the URL by a “?.” The syntax for a element is: . The Method can be either POST or GET, which indicate the type of web server submission used. The main difference between these methods is that the GET method tells the web server to send the incoming data to the standard CGI variables. The POST method tells the web server to send the data directly onto its incoming data stream to be processed. The default option is GET. The URL is parsed when it reaches the web server and the name=value pairs are extracted into temporary ASP variables using the Request.QueryString() method provided by ASP technology. Similarly, any form data on an active server page can be extracted using the Request.Form() method.

PAGE 42

32 4.1.2 Step II, Submit Book Keeping Information After clicking the “Go to Step II” button, the user is taken to addPage2.asp. See Figure 4-5. Here the user provides book keeping information like the title of the resource, the source of the resource, the date on which the resource becomes visible to members, the resource’s expiration date, etc. Figure 4-5. Step II – Submit book keeping information The baseTable table is populated with this information through the application of an INSERT query. 4.1.3 Step III, Submit Meta-data Information Figure 4-6 illustrates addPage3.asp and the attributes required to complete the process. All attributes shown on this page are specific to the resource’s category. For

PAGE 43

33 example, the title of thesis, author(s), institution, academic department, publication year, and keywords are the attributes specific for resources added to the thesis category. This page demonstrates the dynamic nature of our implementation. The attributes shown on this page can be added, modified, or deleted by administrators. Once the administrator has changed the database and the ASP page recognizes this change dynamically and displays fields found on the form as shown in Figure 4-6. Figure 4-6. Step III – Submit meta-data information This design has the following advantages: Attributes can be added, modified, or deleted as and when required. There is no need to change any of the ASP pages since the data is picked dynamically from the database. This design shows more flexibility to the user and administrators in using the system.

PAGE 44

34 At the end of step III, the resource is added to the repository. Main.asp displays this resource the next time this user logs in. 4.2 Search Resources The search page provides one basic and two advanced search facilities. 4.2.1 Basic Search Whenever a user wants to search through the resources, the search criterion that comes to his/her mind is the name of the resource, the author, or keyword(s) describing the resource. The basic search facility show in Figure 4-7 allows users to search resources using any of these fields. Figure 4-7. Basic search page When the search button is clicked, the script in the asp page establishes a connection with the database using an ODBC connection. An SQL query created using the search criteria submitted by the user is sent over the ODBC-bridge to the database. The query is executed and the result is returned to an ASP file. The results are then

PAGE 45

35 wrapped in HTML and sent to the user’s browser. An example of the generated SQL query code used when searching for resources by keywords is shown in Figure 4-8. In this example, the user is searching for the keyword “renal.” The generated query is a nested SQL query since the desired attributes span two different tables. The outer SELECT statement chooses the fields from baseTable table that need to be displayed in the search result. The inner SELECT query retrieves “docId”s from the attributeDataTable table having an attributeName equal to “Keywords” and a value of “renal.” The LIKE operator in combination with the “%” symbol is used to match all records containing the text “renal” anywhere in the keywords. Thus records like “Metastatic renal cell carcinoma” and “Transitional cell carcinoma of right-renal-pelvis” are returned. The attributeDataTable.docId is now matched against baseTable.docId and the records that join on this condition are returned. SELECTbaseTable.*FROMbaseTable WHEREdocID IN (SELECTattributeDataTable.docId FROMattributeDataTable WHERE( (attributeDataTable.attributeName='Keywords') AND ((attributeDataTable.Data)Like'%renal%') ); ); Figure 4-8. SQL query executed for keyword-based search 4.2.2 Advanced Search The basic search returns a large number of records since this search method cannot use complicated queries that narrow searches to the specific results needed by the user. The following example illustrates this better. The query: SELECT * FROM baseTable WHERE Authors LIKE '%Jan%';

PAGE 46

36 returns records with author names of “Jan e Miller,” “Jan et Majors,” “Rebecca Jan kowitz,” and many others containing the phrase “Jan” anywhere in the author’s name. Suppose the user wants to search for a Journal/Serial Article written by “Brussel B.” This cannot be done with the basic search. For such queries, two advanced search facilities are provided. Using the first of the advanced searches shown in Figure 4-9, users can narrow their search by entering a resource type, author’s name, and/or a title and selecting the match criteria Exact Match or Words must appear in Title. Figure 4-9. Advanced search page I The second advanced search allows the user to refine their search criteria further using the boolean operators AND and OR. Users can use any combination of resource

PAGE 47

37 type, author, or keywords. Figure 4-10 shows the search facility and the fields entered by a user for the sample query below: Find all resources in the Journal/Serial category with the title “Tuberculosis – Pulmonary in MIMS Disease” OR resources by a contributor whose name includes “sistc.” Figure 4-10. Advanced search page II The SQL query generated by this query is shown in Figure 4-11. The search results are displayed on searchResultsI.asp and searchResultsII.asp for the advanced search I and advanced search II, respectively. Also provided are links to download the identified resources from the repository.

PAGE 48

38 SELECTbaseTable.* FROMbaseTable WHERE( (baseTable.catName='Journal/SerialArticle'AND baseTable.contributorId='sistc') OR (baseTable.Title=' Tuberculosis-Pulmonaryin MIMSDisease') ); Figure 4-11. The SQL query generated from the advanced search shown in Figure 4-10 4.3 Browse Resources Sometimes users may want to preview a list of resources in a certain category instead of using the search engine directly. The browse resources facility provides a predetermined hierarchical index of the subject categories. When a user clicks on one of the categories link, the system provides a list of resources in that category. A link to information specific to that resource is also provided. This can help a user narrow their search by providing subject terms, keywords, and other attributes to use on one of the search pages. The resource tree also provides a link to download any resource. The browse resource facility provides an excellent road map to sub-topics and broadly defined topics. We call this view a “dynamic table of contents view” since the categories and resources in the categories are generated dynamically and displayed like a table of contents in a thesis or journal paper. If administrators add a new category, the ASP page recognizes the change dynamically and includes this information in the display.

PAGE 49

39 Figure 4-12. Screen shot of the Browse resource page Figure 4-12 illustrates the browse resource page and Figure 4-13 shows the results page when users clicks the Thesis link to view the list of resources in this category. Figure 4-13. The list of resources in the Thesis category

PAGE 50

40 4.4 Summary and What Is Next We covered an in-depth discussion of the three basic features of our RKB implementation: – adding, searching, and browsing resources. We also examined the working of some of the generated SQL queries. In the next chapter we provide a brief review of RKB and its contributions. We conclude the chapter by suggesting some future extensions to the present RKB design.

PAGE 51

41 CHAPTER 5 CONCLUSION AND FUTURE WORK Centralization and easy-retrieval of resources has been an important research topic for the past couple of years. Many comprehensive solutions to categorizing and extracting the information have exhibited shortcomings. This thesis examined building of a repository to overcome these limitations by classifying resources into categories. The approach used combined database concepts for storing meta-data with a user-friendly interface. This thesis presented an overview of the problem, related research and technologies used, the database schema developed, and an examination of the implemented Radiology Knowledge Bank (RKB). The following section provides a final review of the RKB’s contributions and some possible future extensions. 5.1 RKB and Its Contributions This thesis presented one approach to representing semantic information from resources that are added to the repository. The information stored as meta-data are used when searching and browsing the resources. Another important contribution is the categorization of resources based on their “type” thus providing a hierarchical organization of the resources. The dynamic nature of attributes being added, deleted, or modified by the system administrators and the automatic placement of documents into the correct hierarchy of categories make this implementation unique. This demonstrates the ease and flexibility of our implementation. Also the generic definition of documents as “resources” makes it possible to support different kinds of documents.

PAGE 52

42 RKB presents a user-friendly interface making it easily acceptable in the medical community. The UNIX-based group level access to resources provides privacy of medical data by channeling users based on their group numbers. For broadly defined topics, the browse resource facility provides users with a dynamic table of contents view of the resources organized by categories. RKB provides a basic search facility to locate resources by name of author, keywords, or by title. Users can apply the boolean operators AND and OR using the two advanced search facilities, which allows them to narrow their search. Though our approach in developing (implementing) the knowledge bank is successful and valuable to the research in the field of online repositories, there are some limitations in our implementation. The following section presents some of these shortcomings and suggestions for possible extensions. 5.2 Limitations and Future Work Though the RKB provides a good search engine to find resources from our repository, it is not designed to provide results based on relevancy or to provide results efficiently. SQL queries that employ “like %” in their query constructs long execution times and return large sets of results. The user must then search through the returned results to find the desired resources. The following sub section provides a discussion of some of the possible enhancements to overcome this limitation and make RKB a more efficient and powerful system. 5.2.1 Synonyms and Spelling Errors Synonyms exist for many English words and medical terms. A user looking for resources involving the keyword “heart” will be limited to those containing that word only. There might be other resources that are more relevant to “heart” which use the

PAGE 53

43 medical synonym “cardio” [Sriram 2002]. A possible enhancement would be to build a database of synonyms and let the RKB search based on the keyword and its synonyms. Any search in RKB is initiated by typing words in text boxes. Typing always introduces a chance of spelling mistakes that could lead to finding either irrelevant or no resources. For example, if the word “renal” were misspelled as “renel,” the search result would be empty or would return results that are completely irrelevant. The Google search engine corrects any typographical errors automatically and searches using the right spelling. This can improve the speed and accuracy of searches and would be an excellent future extension to the search feature. 5.2.2 Rank Based Searches As the size of the repository grows with more and more members submitting resources, the searches may take a significant amount of time to execute and display the results. The “like %” constructs in SQL queries can search for keywords within long sentences. However its disadvantage is that it slows the query processing. This is not desirable since RKB is a growing database. A possible solution to this problem is to use an indexing scheme like in Sriram’s implementation of WebSE [Lakshmi 2002]. A set of special tables called pseudo-indexes is used to index important fields in the database. These pseudo-indexes form the metadata of a 3-level indexing scheme. This type of implementation can expedite the search process for a system that does not support full text indexing. Another feature that can be added to the search facility is ordering the search results by relevancy. Some of the ranking schemes that could be applied are:

PAGE 54

44 Word Frequency. – Resources with more occurrences of the keyword can be assumed to be more relevant that ones with fewer occurrences. This principle of “word frequency” is used in almost all major search engines and can be adapted to our RKB search engine. Location of occurrence and PageRanking. –Search engines like Altavista and Google consider the location of word in the text for ranking search results [Eyleen 1997]. Google uses a PageRanking system [Page et al.] that implements a webcrawler to build a database of index of links as it crawls the web. These indexes are then ranked to show their relevance. Documents having the keyword in the heading obtain a larger weight than ones with the keyword towards the end. A similar strategy could be employed in RKB. Medline [Medline 2002] provide a large database of category names to the medical community. This database could be modified and adapted to our RKB to provide a deeper categorization. In the current system, administrators use forms and wizards provided by Microsoft® Access to perform administrative tasks like adding, modifying, or deleting categories and attributes. It would be useful to perform these functions via the web. ASP scripts to build these administrative pages are currently under development.

PAGE 55

45 APPENDIX CODE LISTING AddPage1.aspcode <%@LANGUAGE="VBSCRIPT"%> <% DimselectCategoryRSet DimselectCategoryRSet_numRows SetselectCategoryRSet= Server.CreateObject("ADODB.Recordset") selectCategoryRSet.ActiveConnection=MM_kbConn_STRING selectCategoryRSet.Source="SELECTcategoryNameFROM categoryTable" selectCategoryRSet.CursorType=0 selectCategoryRSet.CursorLocation=2 selectCategoryRSet.LockType=1 selectCategoryRSet.Open() selectCategoryRSet_numRows=0 %>
 
<% selectCategoryRSet.Close() SetselectCategoryRSet=Nothing %> AddPage2.aspcode <%@LANGUAGE="VBSCRIPT"%> <% '***EditOperations:declarevariables DimMM_editAction DimMM_abortEdit DimMM_editQuery DimMM_editCmd DimMM_editConnection DimMM_editTable DimMM_editRedirectUrl DimMM_editColumn DimMM_recordId DimMM_fieldsStr DimMM_columnsStr DimMM_fields DimMM_columns DimMM_typeArray DimMM_formVal DimMM_delim DimMM_altVal DimMM_emptyVal DimMM_i MM_editAction=CStr(Request.ServerVariables("SCRIPT_NAME")) If(Request.QueryString<>"")Then MM_editAction=MM_editAction&"?"&Request.QueryString EndIf

PAGE 59

49 'booleantoabortrecordedit MM_abortEdit=false 'querystringtoexecute MM_editQuery="" %> <% '***InsertRecord:setvariables If(CStr(Request("MM_insert"))="baseTableForm")Then MM_editConnection=MM_kbConn_STRING MM_editTable="baseTable" MM_editRedirectUrl="addResource2.asp" MM_fieldsStr= "txtTitle|value|selectSource|value|selectYesNo|value|txtVisi ble|value|txtExp|value|txtDel|value|txtAdd|value|primaryID|v alue|cont|value|gp|value|catName|value|linkToDestination|val ue" MM_columnsStr= "Title|',none,''|Source|',none,''|openToPublic|none,Yes,No|v isibleDate|',none,NULL|expiryDate|',none,NULL|deleteDate|',n one,NULL|addedDate|',none,NULL|docId|',none,''|contributorId |',none,''|groupNum|',none,''|catName|',none,''|linkToDest|' ,none,''" 'createtheMM_fieldsandMM_columnsarrays MM_fields=Split(MM_fieldsStr,"|") MM_columns=Split(MM_columnsStr,"|") 'settheformvalues ForMM_i=LBound(MM_fields)ToUBound(MM_fields)Step2 MM_fields(MM_i+1)=CStr(Request.Form(MM_fields(MM_i))) Next 'appendthequerystringtotheredirectURL 'Prashuaddedfromhere MM_editRedirectUrl=MM_editRedirectUrl&"?docID="& Request.Form("primaryID") 'If(MM_editRedirectUrl<>""AndRequest.QueryString<> "")Then 'If(InStr(1,MM_editRedirectUrl,"?",vbTextCompare)= 0AndRequest.QueryString<>"")Then 'MM_editRedirectUrl=MM_editRedirectUrl&"?"& Request.QueryString 'Else 'MM_editRedirectUrl=MM_editRedirectUrl&"&"& Request.QueryString 'EndIf 'EndIf EndIf %> <%

PAGE 60

50 '***InsertRecord:constructasqlinsertstatementand executeit DimMM_tableValues DimMM_dbValues If(CStr(Request("MM_insert"))<>"")Then 'createthesqlinsertstatement MM_tableValues="" MM_dbValues="" ForMM_i=LBound(MM_fields)ToUBound(MM_fields)Step2 MM_formVal=MM_fields(MM_i+1) MM_typeArray=Split(MM_columns(MM_i+1),",") MM_delim=MM_typeArray(0) If(MM_delim="none")ThenMM_delim="" MM_altVal=MM_typeArray(1) If(MM_altVal="none")ThenMM_altVal="" MM_emptyVal=MM_typeArray(2) If(MM_emptyVal="none")ThenMM_emptyVal="" If(MM_formVal="")Then MM_formVal=MM_emptyVal Else If(MM_altVal<>"")Then MM_formVal=MM_altVal ElseIf(MM_delim="'")Then'escapequotes MM_formVal="'"&Replace(MM_formVal,"'","''")&"'" Else MM_formVal=MM_delim+MM_formVal+MM_delim EndIf EndIf If(MM_i<>LBound(MM_fields))Then MM_tableValues=MM_tableValues&"," MM_dbValues=MM_dbValues&"," EndIf MM_tableValues=MM_tableValues&MM_columns(MM_i) MM_dbValues=MM_dbValues&MM_formVal Next MM_editQuery="insertint o"&MM_editTabl e&"("& MM_tableValues&")values("&MM_dbValues&")" If(NotMM_abortEdit)Then 'executetheinsert SetMM_editCmd=Server.CreateObject("ADODB.Command") MM_editCmd.ActiveConnection=MM_editConnection MM_editCmd.CommandText=MM_editQuery MM_editCmd.Execute MM_editCmd.ActiveConnection.Close

PAGE 61

51 If(MM_editRedirectUrl<>"")Then Response.Redirect(MM_editRedirectUrl) EndIf EndIf EndIf %> <% DimRecordset1 DimRecordset1_numRows SetRecordset1=Server.CreateObject("ADODB.Recordset") Recordset1.ActiveConnection=MM_kbConn_STRING Recordset1.Source="SELECTMAX(docId)FROMbaseTable" Recordset1.CursorType=0 Recordset1.CursorLocation=2 Recordset1.LockType=1 Recordset1.Open() Recordset1_numRows=0 %>