Citation
Web-based search engine for radiology teaching file

Material Information

Title:
Web-based search engine for radiology teaching file
Creator:
Lakshmi, Shriram ( Dissertant )
Dankel, Douglas D. ( Thesis advisor )
Sistrom, Chris ( Reviewer )
Wilson, Joseph N. ( Reviewer )
Place of Publication:
Gainesville, Fla.
Publisher:
University of Florida
Publication Date:
Copyright Date:
2002
Language:
English
Physical Description:
ix, 51 p.

Subjects

Subjects / Keywords:
Database design ( jstor )
Databases ( jstor )
Indexing ( jstor )
Keyword searching ( jstor )
Physicians ( jstor )
Radiology ( jstor )
Search engines ( jstor )
Search terms ( jstor )
SQL ( jstor )
Web servers ( jstor )
Computer and Information Science and Engineering thesis, M.S
Dissertations, Academic -- UF -- Computer and Information Science and Engineering
Radiography, Medical -- Digital techniques
User interfaces (Computer systems)
Web search engines

Notes

Abstract:
Teaching in any field can be successful only with the help of good illustrations. This is especially true in the medical field, where even an excellent professor can sound vague, without good examples. As a result, the academic medical community started collecting together illustrative images of cases in a database. This database is known as the "Radiology Teaching File" or simply "RTF" and has proven to be a great asset to academic radiology departments. However, as the size of the database increases, maintaining the database and searching the database for cases with a particular characteristic become time-consuming and cumbersome. Hence, there is a need for a system that can provide an easy to use interface to maintain the database and also a retrieval system that can fetch the relevant cases. The focus of this thesis is to provide an easy-to-use search engine that can search the RTF efficiently. This search engine, called WebSE (Web-based Search Engine), was developed with usability, versatility, and scalability in mind. The user can search the database of cases using an easy-to-use general search, like Google (the web search engine), or the user can use a more complicated interface to run a more precise search. In the former, a relevancy ranking mechanism helps the user to see the most important cases first. As a result, this tool helps in making teaching and learning more effective. ( ,, )
Subject:
engine, file, radiology, ranking, relevancy, search, teaching
System Details:
System requirements: World Wide Web browser and PDF reader.
System Details:
Mode of access: World Wide Web.
General Note:
Title from title page of source document.
General Note:
Includes vita.
Thesis:
Thesis (M.S.)--University of Florida, 2002.
Bibliography:
Includes bibliographical references.
General Note:
Text (Electronic thesis) in PDF format.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Lakshmi, Shriram. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Embargo Date:
12/27/2005
Resource Identifier:
1104391441 ( OCLC )

Downloads

This item has the following downloads:


Full Text












WEB-BASED SEARCH ENGINE
FOR
RADIOLOGY TEACHING FILE












By

SHIRIRAM LAKSTIMI


A THESIS PRESENTED TO THlE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THlE REQUIREMENTS FOR THlE DEGREE OF
MASTER OF SCIENCE

UNIVERSITY OF FLORIDA


2002




























Copyright 2002

by

Shriram Lakshmi




























To My Wonderful Family
















ACKNOWLEDGMENT S

I would like to thank Dr. Doug Dankel for encouraging me to start this work and providing timely guidance and help. I would also like to thank Dr. Chris Sistrom for spending a lot of time in reviewing our work and providing excellent feedback and suggestions right from the beginning. I also thank Dr. Joe Wilson for being in my committee and giving suggestions from time to time.

I would like to thank Ms. Aarathi Balakrishnan for her valuable criticism and

suggestions on my work. I also thank Mr. Balaji Krishnaprasad for providing me with the necessary resources for completing this thesis. I would also like to thank all my friends who have helped me in so many ways throughout my graduate study. Finally, I would like to thank my family for all their love, encouragement and support throughout my life.
















TABLE OF CONTENTS
pM. e

A CKN OW LED GM EN TS . iv

LIST OF FIGURE S . vii

A B S T R A C T . v iii

CHAPTER

I IN TRODU CTION . I

1 . 1 P ro b lem D o m ain . 2
1 .2 S y ste m D e sig n . 3
1.3 D esign A dvantages . 5
1.4 D ocum ent Structure . 6


2 BACKGROUND AND TECHNOLOGY OVERVIEW . 7

2.1 Existing RTF System . 7
2.2 Technologies U sed . 9
2.2.1 Relational D atabases . 10
2.2.2 Structured Query Language . 10
2.2.3 Stored Procedures . 11
2 .2 .4 T rig g e rs . 12
2.2.5 A ctive Server Pages . 13
2 .2 .6 V B S c rip t . 14
2.2.7 Open-D ata-Base-Connectivity (ODBC) . 14
2.3 Sum m ary and W hat Is N ext . 16


3 SY STEM FEA TURES AN D DESIGN . 17

3.1 M ain D atabase D esign . 17
3 .1 .1 T F T a b le . 1 8
3 .1 .2 Im ag e s T ab le . 19
3 .1 .3 O th er T ab le s . 2 0
3.1.3.1 Contributors . 20
3 .1 .3 .2 P atie n t In fo . 2 1
3 .1 .3 .3 M o d a lity . 2 1


v









3 .1 .3 .4 R a tin g . 2 1
3.1.3.5 H ospital Info . 21
3.2 W eb SE W although . 22
3.2.1 Free-Text Search . 22
3.2.2 A dvanced Search . 28
3.3 Sum m ary and W hat Is N ext . 29


4 IN SIDE W ebSE . 30

4.1 Indexing System . 32
4 . 1 .1 W o rd L ist . 3 2
4 .1 .2 W o rd In fo . 3 3
4 .1 .3 C o lu m n In fo . 3 5
4 .1 .4 Im ag e C o u n t . 3 5
4.2 Query Processing: H ow It W orks . 36
4.2.1 Traditional System . 37
4 .2 .2 W eb S E . 3 8
4.2.2.1 Single term search . 38
4.2.2.2 M ultiple term search . 39
4 .3 R a n k in g . 4 0
4.3.1 Frequency of Occurrence . 41
4.3.2 Colum n of Occurrence . 41
4.3.3 A djacency of W ords . 42
4.3.4 W eighted W ords . 42
4.4 Im plem entation of Ranking . 42
4.4.1 Ranking in Single W ord Searches . 42
4.4.2 Ranking in M ultiple W ord Searches . 43
4.4.2.1 A dditional strategy . 43
4.4.2.2 Im plem entation . 45
4.4 Sum m ary and W hat Is N ext . 46


5 CON CLU SION S AN D FU TURE W ORK . 47

5 .1 W e b S E . 4 7
5 .2 E x te n sio n s . 4 8
5 .2 .1 S y n o n y m s . 4 8
5.2.2 Spelling M istakes . 48
5 .3 S u m m a ry . 4 9


LIST OF REFEREN CES . 50

BIOGRAPH ICAL SKETCH . 51
















LIST OF FIGURES

Figure pM. e

I -I H igh Level D esign . 4

2-1 Existing RTF Architecture . 8

3-1 M ain D atabase D esign . 18

3-2 Basic Search Screen . 22

3-3 Search Results Screen . 23

3-4 Case D etails Screen . 25

3-5 View Im ages Screen . 26

3-6 Another View Im ages Screen . 27

3-7 A dvanced Search Screen . 29

4-1 The 3 -Layer Index Architecture . 31

4-2 Index Tables and Relationships . 32

4-3 RTF and Index Tables . 34




















vii















Abstract of Thesis Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Master of Science

WEB-BASED SEARCH ENGINE FOR RADIOLOGY TEACHING FILE By

Shriram Lakshmi

December 2002


Chair: Dr. Douglas D. Dankel 11
Major Department: Computer and Information Science and Engineering

Teaching in any field can be successful only with the help of good illustrations.

This is especially true in the medical field, where even an excellent professor can sound vague without good examples. As a result, the academic medical community started collecting together illustrative images of cases in a database. This database is known as the "Radiology Teaching File" or simply "RTF" and has proven to be a great asset to academic radiology departments.

However, as the size of the database increases, maintaining the database and

searching the database for cases with a particular characteristic become time-consuming and cumbersome. Hence, there is a need for a system that can provide an easy to use interface to maintain the database and also a retrieval system that can fetch the relevant cases. The focus of this thesis is to provide an easy-to-use search engine that can search the RTF efficiently.









This search engine, called Web SE (Web-based Search Engine), was developed with usability, versatility, and scalability in mind. The user can search the database of cases using an easy-to-use general search, like Google (the web search engine), or the user can use a more complicated interface to run a more precise search. In the former, a relevancy ranking mechanism helps the user to see the most important cases first. As a result, this tool helps in making teaching and learning more effective.














CHAPTER I
INTRODUCTION

The academic medical community has always felt the need for a database of cases for illustration. A description of any anomaly is never complete without a few practical case reviews. These case reviews are a must, not only for making the students better understand the pathologic condition at hand, but also for serving to expose the student to some real world cases before the student begins to actually diagnose real patients.

A case usually includes a few X-rays, and/or other forms of medical images, of the affected part of the patient. There is no substitute for these images while teaching. The statement "A picture speaks a thousand words" is especially true here, since medical images are complex and the student's attention needs to be driven to the right details.

To make teaching effective and complete, every medical faculty member needs a

database of cases illustrating each pathologic condition. This database of illustrative cases is called the "Radiology Teaching File," referred to as RTF from this point on. The RTF can be used to store not only cases that are a perfect example of a condition but also cases that are exceptions to the general behavior. This makes the new generation of physicians aware of those exceptions and more prepared to handle them.

Considering that physicians constantly add cases to the RTF and search it

frequently for cases of a particular kind, it is imperative to have a system that can make this process easy and quick. This leads to better use of the physician's time and also leads to overall effectiveness of teaching.









1.1 Problem Domain

The RTF is an ever-growing database, with a wide variety of cases. Each case has its own set of images. For example, the RTF at the Department of Radiology, University of Florida, has around twenty thousand patient cases with each case having six images on average. In addition, new cases are added on a daily basis. This makes maintaining all the image files and the database of cases cumbersome.

Additionally, the size of the RTF places a burden on the physicians when they attempt to locate a case with a particular characteristic. Traditionally, cases of interest were selected from a list of all available cases, which could be viewed sequentially or searched based on special codes assigned to the cases, like the ACR code.' Obviously, this involves examining many unrelated cases before finding the right case.

In such a scenario, the RTF seems to create more problems than it solves. The RTF can prove useful only if the physicians find it easy to add cases, maintain the database, and retrieve cases quickly [Dugas et al. 2001] and efficiently. The user-interface of the system should not require extensive computer knowledge, since its primary users are physicians. At the same time, the system should be quick, efficient, and require little or no maintenance.

This thesis looks closely at the problem of "case retrieval," while Ms. Aarathi Balakrishnan examines the user-interface and maintenance of the system in her thesis "Design and Analysis of User-Interface for RTF" [Balakrishnan 2002]. The "case retrieval" system is referred to as WebSE, standing for Web-based Search Engine, from this point on.


1 Members of the American College of Radiology evolved the ACR codes to code images in a teaching film library.









The Microsoft Access version of the RTF, currently used at the University of Florida, has a limited search capability. Though this system provides an easy-to-use interface, it is greatly inhibited by the limitations in the underlying software. WebSE aims to eliminate the shortcomings of this system by using a new design while at the same time providing a user-friendly interface.

1.2 System Design

WebSE was developed with the above requirements in mind. It is a databasedriven, web-based search engine. The two main components of the overall design are the Data Server and the Web Server. Figure 1. 1 illustrates the overall system design. First, let us look at the two components, examining how they work together.

The data server consists of two databases. The main database consists of all the

patient cases and is called the RTF. In addition to this database, there is another database that is added to speed up query processing in the RTF. This is called the "pseudo-index" database, which contains meta-data about the data in RTF. The data server stores all the data using "SQL Server."

The web server accepts queries from the World Wide Web sent to it from client

browsers. A set of scripts, written in VB Script, resides on this web server. These scripts are also called Active Server Pages or ASP. Clients send their requests through a browser to the web server. The scripts on the web server interact with the data server and respond to the client by sending the search results back to the client.






































Client Browser it Browser Client Brows Chel Br
"er I I


Figure 1-1 High Level Design

Let us examine how the two components work together at run-time. Firstly, the user inputs his query on the browser interface on his computer. This query (which is usually a set of keywords) is sent over the World Wide Web to the web server. When the web server receives a request from a client, it initiates an instance of the search script and executes it. This script, which runs totally on the web server, first establishes a connection with the data server, running SQL Server, using an ODBC (Open Data Base Connectivity) connection. Then the script analyses the query sent by the user and generates a SQL query.









The script sends this generated SQL query to the database server through the

ODBC connection. The instance of the SQL Server, running on the server, accepts this query from the script, executes the query, and returns the result through the ODBC connection to the ASP script. The ASP script formats the result into HTML (Hyper Text Markup Language), which is sent over the World Wide Web back to the client browser initiating the request. The browser on the client's machine renders the HTML source code and presents it to the user in the proper fonnat.

1.3 Design Advantages

There are a number of design advantages to this architecture. First, the userinterface offers a whole array of options to the user for searching. The user can enter a few keywords like in Google or can use an advanced search to search by image modality, age, or any other possible field in the database. The user is also provided with a special code search through which the user can build an ACR code [ACR 94] and search the RTF using that code. This ensures versatility of the search engine.

Second, the user-interface is designed by modeling the existing system. This eases the transition for the physicians from the old system to WebSE. For the additional components in this design, like the keyword search page, the user-interface is modeled after existing popular interfaces, like the interface presented by the Google search engine. This ensures user-friendliness of WebSE.

Third, when the user is presented with the results, the results are sorted in order of decreasing relevance. In this way, the user receives the most relevant case on top of the list. This can be a big time-saver for the user when the database is really large or when the retrieved result set has many cases.









This system is also scalable. The use of pseudo-indexing greatly reduces the processing time involved in searching the database. In addition to making the search quicker, it also makes the search less dependent on the size of the database. This ensures that system performance does not degrade as the size of RTF increases.

Finally, since this WebSE is a web-based system, to take advantage of the RTF, users only need to have a browser installed on their client systems. Also, the client can run the browser on any platform.

1.4 Document Structure

The following chapters look at the design and performance of WebSE and its interface. Chapter 2 discusses the existing teaching file system at the University of Florida and the technologies needed to develop WebSE. Chapter 3 looks at WebSE from the user-perspective, examining the user-interface. Chapter 4 provides an in-depth view of the architecture of WebSE. Chapter 5 presents conclusions and proposes future extensions.














CHAPTER 2
BACKGROUND AND TECHNOLOGY OVERVIEW

The existing implementation of the RTF system in the Department of Radiology at University of Florida has a reasonably good user-interface. Unfortunately, the underlying technology used, inhibits its capabilities to a great extent. This chapter looks at the design of this system and examines its pros and cons. Then, it examines the software tools used for implementing Web SE.

2.1 Existing RTF System

The existing implementation of the teaching file uses a Microsoft Access database. The user-interface was also implemented in Microsoft Access, with the help of Microsoft Access "forms." The user-interface was basic and simple with most of the functionalities provided. Since many people wanted access to this well-designed teaching file it was expanded to include sharing. Before looking at the architecture of this design, we need to understand "linking" as defined in Microsoft Access Help.

In an Access database, "linking data enables you to read and in most cases, update data in the external data source without importing. The external data source's format is not altered so that you can continue to use the file with the program that originally created it, but you can add, delete, or edit its data by using Microsoft Access as well. You can also link tables from other Microsoft Access databases. For example, you might want to use a table from another Microsoft Access database that is shared on a network" [Microsoft 2000b]. This is particularly useful if you want to store all of your tables in one









database on a network server, while keeping forms, reports, and other objects in a separate database that is copied among users of the shared database.

This concept of linking was used here. The actual data was stored in a separate

database. This database had only the tables of the teaching file. The forms and other userinterface related material was placed in another database, which is referenced as the UI database from this point on. This UI database linked to the main database to access the data. Now, the data was present in one central location, and the UI was distributed among all the clients.




TF Database with Data














til Database til Database til Database



Figure 2-1 Existing RTF Architecture

The following are some of the distinct advantages of this architecture.

" The UI access database is one small file, which can be distributed among the users
of the teaching file easily.

" Since the data is in one central location, the problem of duplication is eliminated. In
other words, everyone works on the same data, even though they are accessing it
from different systems.









Any change made to the data by any user (additions, deletions, or updates), will be
seen by all the other users immediately.

This proved to be an excellent, initial implementation of the teaching file.

However, this design is not free from drawbacks, the following are some of its disadvantages.

" The most significant of these is that Microsoft Access does not support a large
number of simultaneous users, thereby limiting the number of people that can
access the database at one time.

" Every person who wants to access the teaching file must have Microsoft Access
installed on their system, which also means that the user must be running Windows. " Updating the user-interface is very cumbersome, because it involves updating all
the copies of the client database. This may even be impossible in some cases.

" Another drawback is that, Microsoft Access is intended for small databases only.
Beyond a certain size, its performance significantly degrades. With the size of the
teaching file growing all the time, this would become a problem eventually.

With more users wanting to access the teaching file, a system that could support more concurrent users and a system that could be platform independent needed to be developed. Also, the system needed to be scalable and to retain the user-friendly features of the current system.

2.2 Technologies Used

This section describes the various technologies used in WebSE. In WebSE, the main data resides on Microsoft SQL Server, also known as "Sequel Server" [Microsoft 2000c]. The following are some of the reasons why SQL Server was chosen for WebSE. " It is an industrial strength database management system, which can handle far
bigger databases than Microsoft Access.

" It provides a more robust access to multiple users at the same time. " It supports stored procedures and triggers, which make database maintenance
easier.









It also has support for handling permissions and security. One can set permissions
for an individual column or for the whole database. This is especially important
because, some tables have confidential patient information, which need to be kept
private.

RTF stores the data in a relational format. SQL Server supports Relational Databases.

2.2.1 Relational Databases

Although there are different ways to organize data in a database, relational

databases are one of the most effective. Relational database systems are an application of mathematical set theory to the problem of effectively organizing data. In a relational database, data is collected into tables (called relations in relational theory) [Elmasri and Navathe 2001].

A table represents some class of objects that are important to an organization. For example, a company may have a database with a table for employees, another table for customers, and another for stores. Each table is built of columns and rows. Each column represents some attribute of the object represented by the table. Each row, or tuple, represents an instance of the object represented by the table.

When organizing data into tables, you can usually find many different ways to define tables. Relational database theory also defines a process called normalization, which ensures that the set of tables you define organizes your data effectively.

2.2.2 Structured Query Language

To work with data in a database, you have to use a set of commands and statements (language) defined by the DBMS software. Several different languages can be used with relational databases; the most common is SQL (Structured Query Language) [Elmasri and Navathe 2001]. The American National Standards Institute (ANSI) and the









International Standards Organization (ISO) define software standards, including standards for the SQL language [Microsoft 2000c]. SQL Server 2000, used for WebSE, supports the Entry Level of SQL-92, the SQL standard published by ANSI and ISO in 1992 [Microsoft 2000c]. The dialect of SQL supported by Microsoft SQL Server is called Transact-SQL (T-SQL) [Microsoft 2000c].

Note that "SQL Server" is the name of the database system sold by Microsoft and is very different from "SQL," which is a database access language standard. The two terms should not be confused with each other.

In addition to the above-mentioned features, WebSE also uses a couple of other

features provided by SQL Server. They are "Stored Procedures" and "Triggers." A brief description of these follows.

2.2.3 Stored Procedures

When you create an application with SQL Server, the Transact-SQL programming language is the primary programming interface between your applications and the SQL Server database. When you use Transact-SQL programs, two methods are available for storing and executing the programs. You can store the programs locally and create applications that send the commands to SQL Server and process the results, or you can store the programs as stored procedures in SQL Server and create applications that execute the stored procedures and process the results [Microsoft 2000c].

Stored procedures in SQL Server are similar to procedures in other programming languages in that they can

" Accept input parameters and return multiple values in the form of output
parameters to the calling procedure or batch.

" Contain programming statements that perform operations in the database, including
calling other procedures.









* Return a status value to a calling procedure or batch to indicate success or failure
(and the reason for failure).

The benefits of using stored procedures in SQL Server rather than Transact-SQL programs stored locally on client computers are

* They allow modular programming. You create a procedure once, store it in the
database, and call it any number of times in your program.

* They allow faster execution. If the operation requires a large amount of TransactSQL code or is performed repetitively, stored procedures can be faster than batches
of Transact-SQL code. They are parsed and optimized when they are created, and
an in-memory version of the procedure can be used after the procedure is executed
the first time. Transact-SQL statements are repeatedly sent from the client each time they run and are compiled and optimized every time SQL Server executes
them.

* They can reduce network traffic. An operation requiring hundreds of lines of
Transact-SQL code can be performed through a single statement that executes the
code in a procedure, rather than by sending hundreds of lines of code over the
network.

* They can be used as a security mechanism. Users can be granted permission to
execute a stored procedure even if they do not have permission to execute the
procedure's statements directly.

2.2.4 Triggers

SQL Server provides two primary mechanisms for enforcing business rules and

data integrity: constraints and triggers. A trigger is a special type of stored procedure that automatically takes effect when the data in a specified table is modified [Microsoft 2000c, Elmasri and Navathe 200 1]. A trigger is invoked in response to an INSERT, UPDATE, or DELETE statement. A trigger can query other tables and can include complex Transact-SQL statements. The trigger and the statement that fires it are treated as a single transaction, which can be rolled back from within the trigger. If a severe error is detected (for example, insufficient disk space), the entire transaction automatically rolls back.









Triggers are useful in the following ways

" Triggers can cascade changes through related tables in the database; however, these
changes can be executed more efficiently using cascading referential integrity
constraints.

" Triggers can enforce restrictions that are more complex than those defined with
CHECK constraints. Unlike CHECK constraints, triggers can reference columns in
other tables. For example, a trigger can use a SELECT from another table to
compare to the inserted or updated data and to perform additional actions, such as
modify the data or display a user-defined error message.

" Triggers can also evaluate the state of a table before and after a data modification
and take action(s) based on that difference.

" Multiple triggers of the same type (INSERT, UPDATE, or DELETE) on a table
allow multiple, different actions to take place in response to the same modification
statement.

2.2.5 Active Server Pages

The Web server component of Web SE supports Active Server Pages (ASP). The server-side scripts are written in VBScript [Microsoft 2000a]. A more detailed description of server side scripts and ASP follows.

Microsoft Active Server Pages (ASP) is a server-side scripting environment that you can use to create interactive Web pages and build powerful Web applications [Microsoft 2002]. When the server receives a request for an ASP file, it processes serverside scripts contained in the file to build the Web page that is sent to the browser. In addition to server-side scripts, ASP files can contain HTML (including related client-side scripts) as well as calls to COM components that perform a variety of tasks, such as connecting to a database or processing business logic.

A server-side script starts execution when a browser requests a. asp file from the Web server. The Web server then calls ASP, which processes the requested file from top to bottom, executes any script commands, and sends a Web page to the browser.









Because the scripts run on the server rather than on the client, the Web server does all the work involved in generating the HTMVIL pages sent to browsers. Server-side scripts cannot be readily copied because only the result of the script is returned to the browser. Users cannot view the script commands that created the page they are viewing. ASP supports various scripting languages including VBScript, which is used by Web SE.

2.2.6 VBScript

Microsoft Visual Basic Scripting Edition, the newest member of the Visual Basic family of programming languages [Microsoft 2000a], brings active scripting to a wide variety of environments, including Web client scripting in Microsoft Internet Explorer and Web server scripting in Microsoft Internet Information Server.

It has all the good features of Visual Basic making it easy to use and learn. At the same time, it blends very well with all the other components of WebSE. VBScript talks to host applications using ActiveX Scripting. With ActiveX Scripting, browsers and other host applications do not require special integration code for each scripting component. ActiveX Scripting enables a host to compile scripts, obtain and call entry points, and manage the namespace available to the developer. ActiveX Scripting is used in Internet Explorer and in IIS. VBScript and ActiveX Scripting can also be used as a general scripting language in other applications.

2.2.7 Open-Data-Base-Connectivity (ODBC)

The scripts interact with the Data Server over an ODBC connection. ODBC is a

standard definition of an application-programming interface (API) used to access data in relational or indexed sequential access method (ISAM) databases [Microsoft 2001]. SQL Server supports ODBC as one of the native APIs for writing C, C++, and Visual Basic applications that communicate with SQL Server.









ODBC enables a database to become an integral part of an application. SQL

statements can be incorporated into the application, allowing the application to retrieve and update values from a database. Values from the database can be placed in program variables for manipulation by the application. Conversely, values in program variables can be written to the database.

ODBC enables applications to access a variety of data sources, including a wide range of relational databases and local ISAM data. The ODBC driver for SQL Server is included with SQL Server.

ODBC defines a call-level interface, or CLI. A CLI is defined as a set of function calls and their associated parameters. A CLI definition uses a native programming language to call functions; therefore, a CLI requires no extensions to the underlying programming language. This contrasts with an embedded A-PI, such as Embedded SQL, where the API is defined as an extension of the source code for a programming language, and applications using the API must be precompiled in a separate step.

ODBC aligns with the following specification and standard for relational SQL database CLI definitions:

* The X/Open CAE specification Data Management: SQL Call-Level Interface (CLI)
[Microsoft 200 1] and

* ISO/IEC 9075-3:1995 (E) Call-Level Interface (SQL/CLI) [Microsoft 2001].

SQL Server programs that are written using the ODBC A-PI communicate with SQL Server through C function calls. The SQL Server-specific versions of the ODBC functions are implemented in a SQL Server ODBC driver. The driver passes SQL statements to SQL Server and returns the results of the statements to the application. ODBC applications are also interoperable with drivers for heterogeneous data sources.






16


2.3 Summary and What Is Next

This chapter examined the existing RTF system and its advantages and

disadvantages. It discussed the reasons for choosing SQL Server over Access and examined the various other technologies like Relational Database, Stored Procedures, SQL, Triggers, 11S, ASP, VBScript, and ODBC. The next chapter looks at WebSE from a user point of view and also includes a walkthrough.














CHAPTER 3
SYSTEM FEATURES AND DESIGN

The previous chapters presented a basic design overview of WebSE. This chapter takes a close look at Web SE from the user perspective. The actual inner design for improving performance and salability and also the implementation details are left for the next chapter. Here, we first examine the underlying database design to better understand the domain of Web SE. Then, we take a screen-by- screen walk through and analyze the functionality offered by WebSE. The User-interface is also scrutinized at every point.

3.1 Main Database Design

For any database driven project to be successful, the underlying database design

needs to be sound. Needless to say, this was the first phase of implementing this thesis, to analyze the tables and "normalize" the database design. Figure 3.1 captures the essence of the database design. It shows the main database tables and the relationships existing between them. A brief description of the significance of the main tables and their main fields follows. Note that this is a description of the data tables alone and does not include the pseudo-index tables that are added to the database for improving salability and performance.







































Figure 3-1 Main Database Design


3.1.1 TF Table

The TF table, or "Teaching File," is the main table that holds all the main data related to the cases, excluding image related information and other information like patient information, hospital information, etc. One tuple exists in this table for every different case. This table includes:

" CASE - NUMBER: This is a unique number assigned to every case in the teaching
file. In database terms this is referred to as the primary key of this table.

" Patient-id: This is a unique number assigned to every patient. Every patient can be
uniquely identified with this id. The patient related information is stored in another
table.









" AGE: This is the age of the patient in days. Since patients can include infants,
whose age will be in days, the age is stored in days. For other patients, to get the
age in years, we can always divide the age in days with 365.

" SEX-MALE: This is the gender of the patient. For males, a value of one is
assigned and a zero indicates a female patient.

" ANATOMIC - CODE: This is an anatomical code assigned to every case that is an
ACR code developed by the American College of Radiology.

" PATHOLOGIC-CODE: This is also part of the ACR code. It represents the
Pathology.

" DIAGNOSIS: This field has the final diagnosis of the case. The physician provides
this value when adding the case to the teaching file.

" CHIEF-COMPLAINT: In other words, "Symptoms," or the patient's initial
complaint.

" COMMENT: This field holds any other comments that the physician may want to
add about the case.

" CONTRIBUTOR: The id of the physician who entered this case. This is stored
because it is sometimes useful to know which physician entered this case into the
teaching file.

" IMAGANON: Holds the image-anonymous status. " Rating id: This is an optional field that the physician can enter, which identifies
how good an example this case is, on a scale of I to 10.

" HXNO: This is the id of the hospital in which this case occurred. The actual
hospital details are stored in another table.

" PRIVATE: This is a flag that sets a case as private or public. Only the contributor
of the case can view a private case.

3.1.2 Images Table

Every case can have more than one associated image. This table stores the

information related to all the images for all the cases. The "case-number" has a foreign key constraint and references the TF table's "case-number." The fields of this table are as


follows









* CASE-NUJMBER: This is the unique id of the case.

* MODALITY: This is the modality of the image. Every medical image has a
modality associated with it. Examples of image modality are X-ray, MIRI, CTScan,
etc.

* IMAGE -ANNOT: This is optional information that can be added to any image to
describe something about the image.

* ORIG -IMAGE LOC: This is the full network path of the image before it was
added to the teaching file.

* IMAGE DIR: It should be noted that in the teaching file, the images are all stored
in a separate set of directories. The database only holds the location to the images
and not the images themselves. This is the directory in which the image is presently
stored.

* IMAGE FILE NAME: The current file name of the image file.

* IMAGE -NUMBER: Since a given case can have more than one image, the images
should be ordered among themselves to preserve the order of the images. This field
contains this information.

The above two tables are the most commonly searched tables in the database, since most users are interested in searching for cases and their images. In addition to this main information, other information is stored in a different set of tables. These tables are rarely searched and only by those who have the required privileges. A brief description of these tables follows.

3.1.3 Other Tables

3.1.3.1 Contributors

This table holds information about all the contributors. Contributors are physicians with permissions to add to, delete from, and modify cases in the RTF. It contains information regarding their login, password, name, and the type of permissions they have to access the teaching file.









3.1.3.2 Patient-Info

This table holds patient information and is not open for public viewing. It should be noted that only the cases are open to the public, the patient information is to be kept confidential at all times. The contents of this table can be viewed only by the administrator or by contributors with appropriate permissions.

3.1.3.3 Modality

This is a lookup table storing all the image modalities and their abbreviation.

Medical images can be of different types and the type is called its modality. Examples of image modality are X-Ray, CT Scan, etc.

3.1.3.4 Rating

This lookup table stores rating descriptions for all possible ratings in the teaching file. A rating can be given to every case inserted into the RTF. It describes the quality of the case - some cases are very good illustrations of an anomaly and some are not that good. Having one lookup table for the ratings makes it possible for all the physicians adding cases to the RTF, to use the same scale for rating.

3.1.3.5 Hospital-Info

As the name suggests, this table holds hospital information. Right now, only the hospital name is present in this table, as only that was thought necessary, but this table can be extended to hold additional information about hospitals.

In addition to the above tables, there are a number of tables holding temporary information and lookup information. The tables mentioned above form the RTF.









3.2 WebSE Walkthrough

WebSE offers a wide range of search options to the user from very simple "freetext search" to a more complex "advanced search." Let us look at each of type of search and examine its features.

3.2.1 Free-Text Search

Whenever a user wants to search for a case, the first item that comes to his/her

mind is the name of the anomaly itself or a set of keywords describing this anomaly. The free-text search offers the user the ability to search the database of cases with these keywords, providing the user with easy and effective communication with the search engine. The user-interface is very similar to Google. This style is used because almost everyone is familiar with this interface, making it ideal for the medical community.


File Edit View Favorites Tools Help I
4-Back - 1[ - f c 2 Search MFavorites WMedia J %- di - ! l % R
AddressI htlp //Iocalhost/shrirarr/ankedSearchi htm J Go Links
University of Florida Dept of Radiology Teaching File



Advanced Search Search without Diaziosis











FF Folo, ,naiet Figure 3-2 Basic Search Screen


1'3http://localhostishriram/RankedGoogieSearchI.htm -Microsoft Internet Explorer I


i~ -_IF X i











Figure 3.2 shows the first screen of WebSE. The user enters a set of keywords


separated by spaces in the provided text box, then clicks on the search button or presses


the enter key. This submits the keywords to "RankedSearchl .asp" on the web server.

- -- . * - I - fi-.i
File Edit View Favorites Tools Help
-Baclk -+ ( 0 QSearch MlFavorites @Nedia F3 a' f I
Address 9 htt://Iocalhost/shriram/RankedSearchl asp 6'Go Links

University of Florida Dept of Radiology j
Teaching File


Icell carcinoma

Advanced Search Search without Dia0nosis

Search Results.

Diagiosis : Huge renal cell carcinoma with necrosis ACR CODE: 81.32 Total ages : 2
Modalities : UnClassified=2 Age of Pt: 50 Gender: Male
Chief Complaint : Incidental finding on pre heatt transplant abdomin Vie, Case Details

Diaiosis : Squamous cell carcinoma obstnicting tie RLL bronch ACR CODE: 6.32 Total hnaoes : 12
Modalities : UnClassified= 12 Age of Pt: 68 Gender: Female
Chief Complaint : Fever. SOB. Vie, Case Details

Diagiosis : Metastatic renal cell carcinonia. ACR CODE : 761.332 Total Inages : 13
Modalities :UnClassified 13 Age of Pt: 75 Gender: Female
Chief Complaint : Restaging scan. View, Case Details

Diagnosis : Left renal cell carcinonla. ACR CODE : 81.324 Total hnages : 9
Modalities : UnClassified=9 Age of Pt: 55 Gender: Male
Chief Complaint : Left flank pain Vie, Case Details

Diaiosis : Transitional cell carcinoma ofrighit renal pelvis. ACR CODE : 813.321 Total Ihaoes : 4
Modalities: CT=4 Age of Pt: 72 Gender: Female

FFiFeF3-L,3eislneanet Figure 3-3 Search Results Screen









The script in the asp file first establishes a connection with the database using an ODBC connection, then dynamically constructs a SQL query with the keywords submitted by the user. This query is sent over the ODBC bridge to SQL Server. SQL Server executes the query and returns to the asp file the results as a set of records. The results are wrapped in HTML and sent to the user's browser.

Figure 3.3 shows the screen displaying the search results. In the figure, the user has submitted the phrase "cell carcinoma." As can be seen in the search results, all the cases listed at the top contain the search phrase in the diagnosis field. This is because the search results have been sorted for relevance. We examine the strategies used to rank the cases based on relevance later in this chapter. The actual implementation of the strategy is explained in the next chapter.

There are three views that show the results to the user, during free-text search. These are

Search Results View,
Case Details View, and
Case Images View.

The user can see the search results in the first view. This screen does not give all

the details about all the cases. Only a few fields of each case are shown. If the user wants the complete details about a case, he can browse through the list and click on "View Case Details." This takes him to the second screen, which shows all the textual information available, about that particular case. The user can also click on "Total Images" to view the images one by one. This takes him to a screen where he can browse through the images for that case.










.-RO *. -- - - - JLX.
File Edit View Favorites Tools Help
-Baclk - + - 4 1 0 QSearch MFavorites aledia 3 ' FL r '15 _D Address19htto: //ocalhost/shri ram/viewCaseDetaiIs.asp?case number 695&search celI%20carcinorra&mark MarkO 6'Go Links Radiology Teaching File - Case Details


Name of Pt
Dia4iosis : Huge renal cell carcinolna with necrosis ACR CODE: 81.32
Total hnaoes : 2
Modalities : UnClassified=2
Age of Pt: 50 Gender: Male
Contributor : sistiomc Inmaganon : ALLAN
Chief Complaint : hicidental finding on pre heat ti'ansplant abdoiin Comnents : COMMENT: Sections fion left nephiectony specimen reveal renal cell carcinoma with extensive tumor necrosis and tumor extension to renal capsule. However, a thin rim of pe'icapsular adipose tissue is present (less than
0.2111111 to fie ttnor). No tuniol is identified iln the inked soft tissue lnargin. There is no ttnior present iii retLr margin. No Q'oss tumor identified in the renal arteiy or renal vein. Back to Search Pa2e









Figure 3-4 Case Details Screen

As can be seen in Figure 3.4, the view case details page displays all the information


shown on the search page. In addition to this information, details like comments are also


shown here. The user is provided with two options: he can either view the images of this


case by clicking on "Total Images" or he can go back to the search page by clicking on

"Back to Search Page" to look at the other cases.

Figure 3.5 shows the browse images view. Usually, a case contains more than one

image. When a user wants to see all the images related to a case, he is taken to the screen


shown in this figure. Here, the first image of the case is shown. Below the image, four


buttons are provided for navigation through the images.









File E i i F o spi File Edi View Favorites Tools Help
-Back (D 1 0 i QSearch MFavorites ledia J i 11 '15 ED Address htt: local l host/Shr iram/viewirrages1 asp?case number 800&search celI%20carcinoma&m ark Mark0 j 'Go BLinks



















Figure 3-5 View Images Screen

Note that in Figure 3.5, the "first" and "previous" buttons are disabled. This is because, the first image for this case is displayed, so it is logical to disable them. In Figure 3.6, the "next" and "last" buttons are disabled showing that it is the last image of the case. When showing images that are neither first, nor last, all of the buttons are enabled. This ensures that unnecessary confusions and problems do not arise while browsing through the images.

Again, at any point the user has the option of going back to the search results view or to the case details view. This is true for all the three views the user can, at any time switch from one view to the other. This not only makes browsing through the set of cases easy, it also makes the process of browsing efficient.































I I j Back to Search Page View Case Details I0 Done I-- -- I Loc' int-anet
Figure 3-6 Another View Images Screen

The search results view is customizable by the administrator. When the search

results are shown, the script first checks the value of a "display" parameter for each field in the RTF. Only when this parameter is true for a field is it shown in the results. These parameters can be set or unset by the administrator, thereby changing the default behavior of the search results view.

Additionally, some users might want to use the RTF to test their skills. To do this, they would like to view all the details of a case except the diagnosis. This way, they can try to determine what the diagnosis might be and then see if their answer matches the one in the RTF. To facilitate this, a link is provided at the bottom of the page that allows the user to search without the diagnosis. This offers the same user-interface, except that the


File Edit View Favorites Tools Help I
*Back 0 A f2 ( * Search WFavorites 4Media J 1I y- a 12 - 111 '1
Address I )http://localhost/Shriram/viewimagesl~asp e Go I Links,


J'a http://localhost/Shriram/viewimagesl.asp - Microsoft Internet Explorer


-_1fi X.









diagnosis field is not shown until the user selects it [Harvard Medical School 2001, University of Washington 2000].

3.2.2 Advanced Search

Sometimes, the user may want to add other criteria into the search. For example, in addition to entering a search phrase, the user might want to limit his search results to cases involving only infants. Obviously, this cannot be done with the general search. For such queries, an additional search page called the "advanced search" is provided. Figure

3.7 shows the advanced search screen.

Here, in addition to specifying keywords for the search, the user can specify an ACR code, Image Modality, Age range, Sex, and Contributor. None of these fields are mandatory. The user can enter values into any combination of fields and obtain search results. For example, the user can search for all cases contributed by a contributor "A," where the patient's sex is female. Or, the user might enter a search phrase "heart cancer" and mention an age range for male patients. This helps in finding very specific cases quickly leading to better use of the information in the RTF. The user can also search by ACR code. An ACR code builder is also provided to help the user build an ACR code for the type of cases for which he is looking.

Here, the search results are shown using the same display scripts used by the basic search. Searching without diagnosis is also provided. By using a uniform style for displaying search results, WebSE eases the understanding the user-interface.











File Edit View Favorites Tools Help
+Back - 4 - 0 1 I Search Favorites aMedia I 1- 'a ON a N R
Address I ht: p//Iocalhost/shriram/advancedsearch.htm 6 . I Link


University of Florida Dept of Radiology

Teaching File


Keywords I
ACR Code r ACR Code Builder
Modality ALL
Age ISelect an Age Group "J
Sex SelectaGender-[
Contributor I Select a Contri utor J




Basic Search Search Without Diagnosis ._j
101 Doe 7 FF-Lc tane

Figure 3-7 Advanced Search Screen

3.3 Summary and What Is Next

We first examined the database structure of RTF. Then, the two types of search capabilities were examined. The basic search provides a quick and easy way of finding the information while the advanced search helps the user to specify more complicated queries. Search results are displayed in decreasing order of relevancy helping the user look at the most relevant cases first. Also, a uniform and user-friendly interface helps the physician to concentrate on cases, rather than the search tool itself. In the next chapter we examine the interior design of WebSE.














CHAPTER 4
INSIDE WEB SE

All the features of WebSE shown in the previous chapters are not of much use if the search itself takes a long time to execute. The goal of this thesis was to design a search engine that could search through a database of cases efficiently and present the results in a user-friendly manner.

The main tables to be searched include the TF table and the images table. The main columns that concern us are "diagnosis . . chief complaint," and "comment." All these columns contain more than one word, (i.e., there may be many phrases or sentences within each field) where each phrase or sentence contains more than one word. To search for keywords within a field, a normal "like %" query is used.

This type of search capability is provided by most database systems today. Using this feature of the database system might seem to be a very simple solution and is fairly easy to implement. However, the "like %" query has a big disadvantage in that it is quite slow for larger databases. This is especially undesirable since the teaching file is an evergrowing collection of cases.

Hence, a different search strategy has been implemented in WebSE. The search is made much faster if we index every individual word in a field. Since database systems do not generally support this type of indexing, the problem is reduced to indexing parts of a field, using a database system that only supports indexing of whole fields.









To accomplish this, a set of special tables is created. These tables hold metadata about the data in the RTF tables. These tables, which will be referred to as pseudo-index tables from now on, act as one layer of the index. These tables are again indexed using SQL Server's index support, which forms the first layer of index. Above all, the main data tables are indexed on the primary key, which acts as the third layer of index. Figure

4.1 shows this 3-layer index architecture.


Query Requests





SQL Seler Index Layer 1
On Pseudo-Index Tables

Pseudo-Index Tables of WebSE Layer 2



.SQL Server Index Layer 3
On RTF .


Main RTF Tables


Figure 4-1 The 3-Layer Index Architecture

This 3-level indexing expedites the search process greatly on systems that do not support full-text indexing. Now, whenever the user submits a query, the whole indexing









system can be used to directly fetch the relevant cases from the TF table in optimal time. Now, we examine the index tables and the whole system at runtime.

4.1 Indexing System

There are 3 main tables that compose the pseudo-index, which is the second layer in the 3-layer indexing system. The Figure 4.2 shows the main tables and the relationships that exist between them. A brief description of each table, its significance, and its columns follow.


Wor dList Colum nInfo
WordInfo
Word
Word d Colmi id


Case iiiunber CohniiNane

Column id Colhiin_weigt

Positionl




Figure 4-2 Index Tables and Relationships

4.1.1 WordList

This table tracks all the words occurring in the database. It keeps information

relating to a word and is independent of the case in which the word occurs. For example, let us say that the word "heart" occurs many times in the database across many cases. Then, there will be only one entry for the word "heart" in this table. The information about each individual occurrence of the word "heart" is stored in another table. The following are the columns in this table.









" word: This field has a list of all the words in the database. If a word occurs in any
of the indexed fields, in any record, in the database, then it occurs in this field.
There is a clustered index built on this column. Clustered indices are supported by
most of the commercially available database systems, including SQL Server. When an index is present in a column of a table, an index structure is maintained by SQL Server to point directly to the data page in which the word resides. Also, since the
index is a clustered index, the data is physically sorted and arranged in that order in the data pages. A search for any one word on this index table requires only two PO operations. One for fetching the index page and the other for fetching the data page.

" word-id: This is a unique id given to every word in this table. This is used when
storing extra information about a word in another table.

" count: This tracks the number of times a word has occurred in the database. The
count signifies the importance of the word. Very important words tend to occur far
less often than common words and hence will have lower counts.

" weight: A weight is assigned to each word based on the number of times it occurs
in the database. Commonly used words like "of' and "the" have a very low weight
assigned to them. On the other hand, less frequently used words like "carcinoma"
have a larger assigned weight.

4.1.2 Wordlnfo

This table tracks each instance or occurrence of every word in the wordlist table.

For every word in the wordlist table, there are one or more tuples in the WordInfo table.

" word-id: This is the unique identifier assigned to every word in the wordlist table.

" case - number: The case number uniquely identifies any given case in the teaching
file database. Here, it stores the case-number of the case in which the word in the
word-id field exists.

" column - id: This is a unique identifier given to every column in the teaching file
that is being indexed. Here, it represents the column in which the occurrence of the
word exists.

" position: The position field stores the position of a word in a column in a given
case. Within a given field, the first word is assigned a position value of "0" and
subsequent words get position values incremented by " I." This field is useful when
searching for an exact phrase match.











It must be noted that this table has no primary key. This enables multiple


occurrences of words within a case and within a column. Each occurrence of the word has its own entry in this table.




WordList

word word id coutt weight Wo



Columnlnfo
Column id Coliumnnanll Column weight WordInfo
word word id case number column id position








case number . diagnosis chief colnplahit comment case_ mber


case nimnber CTCotut XlayCoult . TotalCou
case n te




Images
casemimber ImageLocation Image amLot . Imam

C' se n ber





K Represents a clustered hidex on the left most column of the table


Figure 4-3 RTF and Index Tables









Figure 4.3 shows the index tables along with the main tables of RTF and the index built on each table.

4.1.3 ColumnInfo

This table stores the list of columns in the TF database that need to be indexed. This list of columns is stored in a separate table for increased flexibility and customnizability.

* Column id: This is an "auto-number" generated by the database. It serves as a
unique identifier for a column.

* Column -name: This is the actual name of the column as it occurs in the design of
the TF table. This is used by the index populating scripts to obtain the name of the
column to be indexed.

* Column-Weight: This is a weight assigned to each column, by the administrator.
This weight of the column plays an important role in ranking the query results. The weight of a column signifies its importance compared to other columns in the table.
For example, assigning a weight of "2" to Diagnosis and a weight of"1I" to
Comments means that the diagnosis field is twice as important as the Comments
field.

4.1.4 ImageCount

ImageCount is an additional table storing pre-calculated image statistics about

every case. It is not an index table, but just serves as an extra table to speed up querying. It stores the number of types of images of each modality for every case. This statistical information is to be given to the user when displaying search results. However, calculating these values at runtime slows down query processing. Hence, the values are pre-calculated and stored in this table. At runtime, the values can be obtained from here.

Note that, initially, the index tables are populated based on the data in the RTF at that time. However, as updates are made to RTF, the index tables also need to reflect these changes. Keeping the index tables updated is very important since all the searching









is done through these tables. Otherwise, this could lead to cases that are never shown in the search results or non-existent cases appearing in the results.

In WebSE, this is done with the help of triggers. Whenever a new case is added to the RTF, triggers add the information about the words in the new case to the index tables. Similarly, when a case is deleted, the related information is deleted from the index tables. When an update is made to a case, all the information related to that case is deleted from the index tables and then updated data is re-inserted.

We have seen the various components of the indexing system. Let us now see how the 3-layer indexing system performs during run-time.

4.2 Query Processing: How It Works

When a user submits a search phrase, the web server creates an instance of the search engine script. Then, the search phrase entered by the user is split into individual words by assuming "space" as the delimiter. The script stores all the individual words in an array. Each word is scanned for * (wildcard). Depending on the whether wildcard characters are found or not, a normal SQL query or a wildcard SQL query is built. This query is submitted to the Database engine.

Every database engine develops a "query execution plan" before actually executing a query [Stonebraker and Hellerstein 1998]. An execution plan is nothing but a sequence of low-level operations that the database engine needs to perform to execute the query. The plan takes into consideration the indexes present on all the tables involved and also the sizes of the tables involved and optimizes the plan for time taken to execute the query. Here also, once the SQL Server receives a query from the web script, it analyzes the query and tries to formulate a plan. Due to the existence of the index system, the database engine executes the query in a particular way.









We now examine how the query processor executes the query and see why it is

quicker than a traditional execution plan. Note that there may be some slight variations in the actual processing path taken by the query processor of SQL Server from what is mentioned below [Stonebraker and Hellerstein 1998]. However, the main steps taken remain the same. We first look at the execution path for a traditional "like %" query and then look at our system.

4.2.1 Traditional System

Let us examine the following SQL query, analyzing its run-time execution.

"Select * from TF where diagnosis like '%heart%'

orchief complaint like '%heart%'

or comment like '%heart%' "

The above query looks for all cases that have the word "heart" mentioned

somewhere within the diagnosis, chief complaint, or comment fields. Note that, "%" is the wildcard character in SQL.

So, the job of the database engine now is to find all the cases that have the word "heart" mentioned anywhere in the given fields. But, the database engine does not know which cases in the TF table have the word occurrence. So, the only way to execute this query will be to look at each and every case in the TF table. To do this, it has to fetch every data page that contains any record of the TF table and search the relevant fields for the word "heart."

Since every page-fetch involves 1/0, which is very expensive in terms of time, the whole process can be considerably slow. As the size of the database increases, the total number of 1/0 operations required to execute a query, also increases linearly. Let us now compare this with what happens in WebSE.









4.2.2 WebSE

Query processing in Web SE is performed in two parts, first we look at queries where the user only has one search term and then we look at multiple-term queries.

4.2.2.1 Single term search

Let us look at the same example used previously. To search for all the cases with the word "heart," the search engine presents a different SQL query to the database engine. This query makes use of the 3-layer indexing system and offers some speedup to query processing. The query is as follows

"Select * from TF where case-number in

(Select case-number from wordlnfo where word id in (Select word id from wordlist where word='heart'))." Query execution always starts with the innermost query. So, the first step in the execution plan is to search the wordlist table for the word "heart." If the word is present in the wordlist table, then it exists in the TF database. Otherwise, it can be safely assumed that the word does not exist in the TF too. Since there is an index on this table's "word" field, SQL Server maintains a separate index structure that has pointers to the exact location of every record in its data page, based on the word. SQL Server uses this index structure implicitly for fetching the right tuple from the wordlist table. So, the data page of the wordlist table that has the word "heart" in it is retrieved. Now, the word id from this record can be read.

The execution now proceeds to the next step and executes the next query in the

hierarchy. The goal of this step is to fetch all the records in WordInfo table that have the word id selected from the previous step. Again, there is a clustered index on the word id field here. So, the database system knows which data pages need to be retrieved and only









those are fetched. Also, since the index is clustered, all records with the same word-id are stored close to each other, making the number of pages to be fetched, minimal. The case-numbers are selected from this set of records.

Now the outer most query needs to be executed. The database engine needs to fetch all cases that have the given case-numbers. Since the TF table also has an index on the case-number field, the database engine retrieves the required data pages into memory. Thus, the required cases are obtained from the TF without having to search the entire TF table.

Of course, there has been an overhead of two extra 1/0 operations to fetch the index table pages. But considering that TF is a very large table and fetching all the pages into memory takes a significant amount of time, the time saved is much greater. Note that the same logic applies to fetching the image information from the images table.

The other advantage of this design is that, as the size of the database grows, the time to execute the query does not increase linearly (as with the traditional approach). The total 1/0 done decides the time taken to execute the query. In the traditional approach, as the size of the database increases, a search would involve more page fetches, making the execution time increase linearly with the size of the database. Here, no matter what the size of the database, only the required pages will be fetched. However, this does not mean that the execution time is totally independent of the database size. All the pages that have cases from the result set will be fetched anyway and so execution time depends on the number of cases in the final search results.

4.2.2.2 Multiple term search

If there is more than one word in the search string, then the search process is done as a simple extension of the single-term search. First, a search is performed for all the









individual words, on the wordlist table; similar to the way it is done in a single-term search, which results in a list of word-ids. These word-ids are used to get a list of case-numbers for each word id. A different list results for each of the word-ids. Every list represents the list of cases having the word with the given word id somewhere in some column of the TF table. Now, these lists are merged together resulting in a final list of case-numbers, which is used to retrieve the cases from the TF. Again, the rules of indexes still hold, making all searches optimal.

Web SE also supports queries with wildcards. In a search phrase, the user can add a wildcard before or after any word. For example, the user can search for "cardio* *nomyj' This searches for all words starting with "cardio" and all words ending with "nomy." The steps involved in executing wildcard queries are exactly the same.

It should be noted that the basic idea used in the search strategy above is that any case having at least one of the words from the search phrase, in some column or the other, is relevant. This is a well -established technique that is the basis of all the websearch engines and information retrieval systems.

In addition to searching the TF table efficiently, WebSE also ranks the search results based on relevancy. We now examine how ranking of cases is performed in WebSE.

4.3 Ranking

Ranking is the process of ordering the search results in decreasing order of

relevancy [Eylen 1997]. There are many ways of computing "relevancy" of a case for a particular search phrase. The following techniques are used within Web SE.









4.3.1 Frequency of Occurrence

Given a search word, a case that has more occurrences of that word in its record is assumed to be more relevant than a case having fewer occurrences of the same word. All major search engines on the web use this principle of "word frequency" for ranking their results. This statistical approach to ranking in a combination with other strategies has proven to be extremely successful.

4.3.2 Column of Occurrence

In addition to using the frequency of occurrence, search engines like Altavista

[Eylen 1997] and Google consider the location of the word as well. Documents having the search word in a heading get more weight than documents having the word at the bottom. This idea has been employed within our indexing system. However, the direct application of this principle does not make much sense in our indexing system. For example, the name of a disease might occur either in the bottom or the top of the description of the disease (in any column). Obviously, giving more weight to a case where a disease name occurs closer to the beginning is really not significant.

However, consider the following example. Let us say that a user enters a search term, which is the name of a disease. Suppose there are three cases that have the disease name mentioned. Each case has the word occurring in a different column. It would be a good idea to give cases with the word occurring in one column more importance than the others. This is where the ColumnInfo table is used. As mentioned before, this table has a column called "columnWeight." Each column that is to be indexed is given a weight in this column. This is a way of assigning importance or relevance for a column. Based on these weights, the case with the word occurring in the most important column can obviously be considered as the most relevant among its peers.









4.3.3 Adjacency of Words

Yet another strategy used to order relevant results is to use an exact phrase match. This is also used in all the major web search engines and is a pretty simple approach. If a user is looking for the search phrase "heart cancer," then it makes sense that cases having the words "hearf' and "cancer" adjacent to each other in the same order are more relevant than cases having just one of the two words. Also, this principle can be applied recursively.

For example, if the user enters the phrase, "swollen heart anomaly," then, as earlier mentioned, the cases with the phrase "swollen heart anomaly" will be more relevant than cases with only one individual word. In addition to this, cases with partial phrase match should also obtain a higher weight than cases with individual occurrences (i.e., cases with "swollen heart" or "heart anomaly" obtain a higher weight than ones with only "heart" or "anomaly" or both of them in any other order).

4.3.4 Weighted Words

If a search phrase has more than one word, then among the cases with only one

word, the cases with the more important word obtains a higher rank than cases with a not so important word.

4.4 Implementation of Ranking

4.4.1 Ranking in Single Word Searches

Now, for single word searches the weight of the word is multiplied with the weight of the column in which each occurrence of the word exists. The weight of the word is obtained from the wordlist table and the weight of the column in which it occurs can be obtained from the ColumnInfo table. After completion of the above procedure, the result is a list of case-numbers and corresponding weights. There might be multiple









occurrences of a word in a given case, which means the initial list would have had weights of each individual word occurrence. To consolidate the list, we group the result set by case number and sum the weights. Mathematically, it can be represented as

For each case, the final weight =

[ 1: (column weight) word weight
For all occurrences of
the word in a case
Finally, we have a list of case-numbers and a final weight for each case. This is used to rank the cases such that the case with highest weight gets the best rank. It is obvious that cases with more occurrences of the word in the more important columns are in the top of the list and cases with fewer words in lesser important columns are towards the end and all other combinations exist somewhere in the middle of the list. Changing the weights assigned to columns can change the behavior of the search results greatly.

4.4.2 Ranking in Multiple Word Searches

4.4.2.1 Additional strategy

Ranking multiple-word search queries is very different from ranking single-word queries. Here, in addition to the above-mentioned strategies (for single-word queries), the relative ordering of the words within the search phrase also needs to be taken into consideration. The cases where the "exact search phrase" is found are obviously most relevant and should get the maximum weight. This forms the first tier of cases. In the second tier are cases having all the words of the search phrase, where these words are scattered across the case. Again, column weights still hold here.

The third tier consists of all cases having more than one word but not all of them. Ranking of cases in the third tier can be a bit complicated. Cases which having a larger number of higher weighted words in more weighted columns get a better rank than cases









having fewer words in lower weighted columns. Tweaking the weights of the columns can change the behavior of this tier of cases dramatically.

The final tier of cases contains only one of the search terms. Obviously these are the lowest ranked among the tiers and within this tier, the ranks of cases vary based on the word they have and the column in which the word exists.

To summarize, the cases with the exact-phrase need to be ranked first. The cases without an exact-phrase match follow next. And among them, cases with more words are ranked higher than cases with fewer words. First, we will look at an example and then examine how this is implemented.


Table 4-1 Example Table for Ranking

case-number Comment
I This is an interesting case of heart cancer
2 Patient complained of pain in the kidney
3 Good Example of malfunction in the heart
4 Lungs severely affected due to smoking
5 a cancer of heart is suspected
6 patient might need a heart transplant soon
7 Looks like cancer

Let us say Table 4.1 is a simplified RTF. For simplicity, it has only two columns and contains 7 cases. Now, when a user submits the search phrase "cancer of heart," all cases except case number 4 appear. Case number 5 appears first because it has an exact phrase match - this forms the first tier of cases. Case number I comes next since it has all the words of the search phrase though not in that order. This is the second tier of cases.

Case number 3 forms the next tier. It has two words from the search phrase "heart" and "of." Case numbers 2, 6, and 7 comes last as they all have only one word









from the search phrase. These form the last tier of cases. The ranking among these cases depends on the weights assigned to the words themselves.

4.4.2.2 Implementation

First an exact phrase search is done. For this, the case-number lists for all the individual words are calculated. Once these lists are computed, the position fields are checked to see if the words exist adjacent to each other within a given case and column. The lists are merged based on this condition.

In the end, only one list of case-numbers exists, which are the cases where the

exact phrase is present. This list of cases is kept aside for the time being. Also, a special "exact-phrase" constant value is added to all these case's weights to make sure that they get a better rank than the other lists of cases. It should be noted that the exact-phrase match works recursively. In other words, even partial exact phrase matches are given extra weight when compared to the single word matches.

As the next step, each word is searched in the WordList table, and then with its

word-id all the info is obtained from the WordInfo table. So, at the end, we have a list of word-ids and each word-id has a case-number list with weights. If we merge all these lists and group them by case-number and then sum the weights, we obtain one final list of case-numbers with its corresponding final weights.

This list is merged with the "exact-phrase" result list. Since we added a constant to the list of exact phrase weights, they have an increased weight and stay at the top of the list. The cases with a larger number of words obtain a better weight because the sum of all the weights is greater. Mathematically, the formula is

for each case, the final weight =










lI (column weight * word weight)] + exactphrase COflstaflt*fl
For all occurrences of
all words in a case
where "exactphrase constant" is an arbitrary constant added to exact phrase

matches to force them higher in the list. For cases having no phrase matches, this value is zero. And, "n" is the number of phrase matches found in that case.

The final list of case-numbers and weights is sorted in descending order and then the TF table is queried for the corresponding case-numbers to obtain the final list of results.

4.4 Summary and What Is Next

We examined the design of the 3-layer indexing system. We then saw how Web SE performs better than the traditional search engine, by making use of this indexing system. We looked at the ranking strategies used and also how these are implemented. We also looked at how the indexing system updates itself. The next chapter proposes some future extensions to Web SE.














CHAPTER 5
CONCLUSIONS AND) FUTURE WORK This thesis examined the requirements of a search engine for the RTF and then

presented the design and features of Web SE. Web SE focuses on searching the database efficiently and also providing a method of ranking the search results. Presented were an introduction to the problem, a design overview, the design of the existing RTF, the technologies used, the underlying database design, the design of the indexing system and how it improves the query processing, and the ranking strategies used. This chapter presents a final examination of WebSE and proposes some future extensions.

5.1 WebSE

In this thesis, a 3-level indexing system was described and used. It not only helped in making the search process efficient, it also made the search scalable to larger databases without a linear increase in processing time. Web SE also sorts the search results based on the relevancy. This allows the user to look at the most relevant cases first, thus making searching for cases easier and more efficient for the user.

Web SE presents a user-friendly interface making it easily acceptable in the medical community. The interface used is borrowed from Google. Since almost everyone is familiar with the Google interface, using the Web SE interface is very intuitive. Web SE allows a basic search where the user can search the RTF using a set of keywords or a more advanced search, where the user can add more constraints to the search to make it more specific.









Web SE also offers some customnizability. The fields to be shown in the search

results screen can be changed by the administrator. The administrator can also change the weights assigned to columns to change the relative ordering of search results. This allows for greater control over ranking.

5.2 Extensions

The following features can be added to Web SE to make it more user-friendly and powerful as a search tool.

5.2.1 Synonyms

WebSE uses a keyword-based search to find relevant cases in the RTF. However, this is not free from limitations. It is a known fact that synonyms exist for many English words and medical terms. So, if someone searches for the term "heart," Web SE will only find cases containing that word. However, there might be some other good cases in the RTF that are very much relevant to "heart" that use the medical synonym "cardio." Right now, the user can get around this by searching for all the synonyms of a word, but it would better if Web SE can build a database of synonyms and automatically search for all of them.

5.2.2 Spelling Mistakes

Since most of the information in the RTF is entered through typing, there is always a chance of spelling mistakes. This could lead to cases not appearing in the search results. For example, if the word "kidney" is misspelt as "kidey," then this case will not appear for the search term "kidney" unless there is an occurrence of "kidney" somewhere else in the same case. As a future extension to Web SE, a strategy for catching such typographical errors can be devised. This will improve the accuracy of the search system and will lead to better utilization of the information in the RTF.






49


5.3 Summary

In the information age, an efficient and easy-to-use search tool can be priceless in any organization, including the medical community. With WebSE's simple and userfriendly interface and efficient and scalable search, the information in the RTF can be accessed quickly. Once the physicians are able to access any case quickly, they can provide a better teaching experience by using these for illustration.















LIST OF REFERENCES


ACR. 1994. American College of Radiology. www.acr.org. Visited Apr 2002.

Balakrishnan A. 2002. Design and Analysis of User Interface for Radiology Teaching
File. Master's Thesis. University of Florida.

Dugas M., Trumm C., Stabler A., Pander E., Hundt W., Scheidler J., Briining R.,
Helmberger T., Waggershauser T., Matzko M. and Reiser M. Case Oriented
Computer-Based Training in Radiology: Concept, Implementation and Evaluation.
Oct 2001. BMC Medical Education, University of Munich, Germany.

Elmasri R. and Navathe S.B. 2001. Fundamentals of Database Systems. Pearson
Education, Singapore.

Eylen D.V. 1997. AltaVista Ranking of Query Results.
http://www.ping.be/-ping065 8/avrank.html. Visited Feb 2002.

Harvard Medical School. 2001. BrighamRAD.
http://brighamrad.harvard.edu/education/online/tcd/tcd.html. Visited May 2002.

Microsoft. 2000a. VBScript.
http://msdn.microsoft.com/library/default.asp?url=/library/enus/script56/html/vtoriVB Script. asp. Visited Mar 2002.

Microsoft. 2000b. MS Access Help. Microsoft, Seattle.

Microsoft. 2000c. SQL Server Help. Microsoft, Seattle.

Microsoft. 2001. Microsoft ODBC and Universal Data Access.
http://www.microsoft.com/data/odbc/default.htm. Visited Mar 2002.

Microsoft. 2002. Active Server Pages.
http://msdn.microsoft.com/librarEy/default.asp?URL=/Iibrary/psdk/iisref/aspguide.ht
m. Visited Nov 2001.

Stonebraker M., Hellerstein J.M. 1998. Readings in Database Systems. Morgan
Kaufman, San Francisco, CA.

University of Washington. 2000. Radiology Teaching File.
http://www.rad.washington.edu/maintf/. Visited May 2002.
















BIOGRAPHICAL SKETCH

Shriram Lakshmi Narasimhan was born in Tiruchy, Tamil Nadu, India, on

November 18th, 1978. He has lived in Hyderabad for the first 21 years of his life. He graduated from Jawaharlal Nehru Technological University in 2000 and received his bachelor's degree in computer science and engineering.

He is pursuing his graduate study with a specialization in databases at the

University of Florida. He plans to graduate in December 2002 with a master's degree in computer science.

Shriram worked for the University of Florida McTrans Center as a programmer during his graduate study. There, he contributed to the development of the Highway Capacity Software, among other things. He plans to move to Seattle where he has a position with Microsoft as a software design and test engineer in the SQL Server group.




Full Text
xml version 1.0 encoding UTF-8 standalone no
fcla fda yes
!-- Web based search engine for radiology teaching file ( Book ) --
METS:mets OBJID UFE0000559_00001
xmlns:METS http:www.loc.govMETS
xmlns:xlink http:www.w3.org1999xlink
xmlns:xsi http:www.w3.org2001XMLSchema-instance
xmlns:daitss http:www.fcla.edudlsmddaitss
xmlns:rightsmd http:www.fcla.edudlsmdrightsmd
xmlns:mods http:www.loc.govmodsv3
xmlns:sobekcm http:digital.uflib.ufl.edumetadatasobekcm
xmlns:lom http:digital.uflib.ufl.edumetadatasobekcm_lom
xsi:schemaLocation
http:www.loc.govstandardsmetsmets.xsd
http:www.fcla.edudlsmddaitssdaitss.xsd
http:www.fcla.edudlsmdrightsmd.xsd
http:www.loc.govmodsv3mods-3-4.xsd
http:digital.uflib.ufl.edumetadatasobekcmsobekcm.xsd
METS:metsHdr CREATEDATE 2020-04-28T10:07:42Z ID LASTMODDATE 2020-04-27T15:19:15Z RECORDSTATUS COMPLETE
METS:agent ROLE CREATOR TYPE ORGANIZATION
METS:name UF,University of Florida
OTHERTYPE SOFTWARE OTHER
Go UFDC - FDA Preparation Tool
INDIVIDUAL
UFAD\renner
METS:note Online edit by Nicola Hill ( 8/25/2010 )
Online edit by Nicola Hill ( 9/24/2010 )
Online edit by Nicola Hill ( 10/1/2010 )
METS:dmdSec DMD1
METS:mdWrap MDTYPE MODS MIMETYPE textxml LABEL Metadata
METS:xmlData
mods:mods
mods:abstract displayLabel Abstract lang ,, Teaching in any field can be successful only with the help of good illustrations. This is especially true in the medical field, where even an excellent professor can sound vague, without good examples. As a result, the academic medical community started collecting together illustrative images of cases in a database. This database is known as the "Radiology Teaching File" or simply "RTF" and has proven to be a great asset to academic radiology departments. However, as the size of the database increases, maintaining the database and searching the database for cases with a particular characteristic become time-consuming and cumbersome. Hence, there is a need for a system that can provide an easy to use interface to maintain the database and also a retrieval system that can fetch the relevant cases. The focus of this thesis is to provide an easy-to-use search engine that can search the RTF efficiently. This search engine, called WebSE (Web-based Search Engine), was developed with usability, versatility, and scalability in mind. The user can search the database of cases using an easy-to-use general search, like Google (the web search engine), or the user can use a more complicated interface to run a more precise search. In the former, a relevancy ranking mechanism helps the user to see the most important cases first. As a result, this tool helps in making teaching and learning more effective.
type subject Subject engine, file, radiology, ranking, relevancy, search, teaching
mods:accessCondition Copyright Lakshmi, Shriram. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
mods:language
mods:languageTerm text English
code authority iso639-2b eng
mods:location
mods:physicalLocation University of Florida
UF
mods:url access object in context https://ufdc.ufl.edu/UFE0000559/00001
mods:name personal
mods:namePart Lakshmi, Shriram
mods:role
mods:roleTerm marcrelator dis
Dissertant
Dankel, Douglas D.
ths
Thesis advisor
Sistrom, Chris
rev
Reviewer
Wilson, Joseph N.
rev
Reviewer
mods:note system details System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
Title from title page of source document.
Includes vita.
thesis Thesis (M.S.)--University of Florida, 2002.
bibliography Includes bibliographical references.
Text (Electronic thesis) in PDF format.
mods:originInfo
mods:publisher University of Florida
mods:dateIssued 2002
mods:copyrightDate 2002
mods:recordInfo
mods:recordIdentifier source sobekcm UFE0000559_00001
mods:recordContentSource University of Florida
mods:relatedItem original
mods:physicalDescription
mods:extent ix, 51 p.
mods:subject SUBJ650_-0_1 jstor
mods:topic Database design
SUBJ650_-0_2
Databases
SUBJ650_-0_3
Indexing
SUBJ650_-0_4
Keyword searching
SUBJ650_-0_5
Physicians
SUBJ650_-0_6
Radiology
SUBJ650_-0_7
Search engines
SUBJ650_-0_8
Search terms
SUBJ650_-0_9
SQL
SUBJ650_-0_10
Web servers
Computer and Information Science and Engineering thesis, M.S
Dissertations, Academic
UF
Computer and Information Science and Engineering
Radiography, Medical
Digital techniques
User interfaces (Computer systems)
Web search engines
mods:titleInfo
mods:title Web-based search engine for radiology teaching file
mods:typeOfResource text
DMD2
OTHERMDTYPE SOBEKCM SobekCM Custom
sobekcm:procParam
sobekcm:Aggregation ALL
UFIR
UFETD
IUF
GRADWORKS
sobekcm:MainThumbnail lakshmi_s_Page_01thm.jpg
sobekcm:Wordmark UFIR
sobekcm:bibDesc
sobekcm:BibID UFE0000559
sobekcm:VID 00001
sobekcm:Publisher
sobekcm:Name University of Florida
sobekcm:PlaceTerm Gainesville, Fla.
sobekcm:Source
sobekcm:statement UF University of Florida
sobekcm:SortDate 730850
METS:amdSec
METS:digiprovMD DIGIPROV1
DAITSS Archiving Information
daitss:daitss
daitss:AGREEMENT_INFO ACCOUNT PROJECT UFDC
METS:rightsMD RIGHTS1
RIGHTSMD Rights
rightsmd:accessCode public
rightsmd:embargoEnd 2005-12-27
METS:techMD TECH1
File Technical Details
sobekcm:FileInfo
sobekcm:File fileid JP21 width 2550 height 3300
JPEG1 630 815
JPEG2
JP22
JPEG3
JP23
JPEG4
JP24
JPEG5
JP25
JPEG6
JP26
JPEG7
JP27
JPEG8
JP28
JPEG9
JP29
JPEG10
JP210
JPEG11
JP211
JPEG12
JP212
JPEG13
JP213
JPEG14
JP214
JPEG15
JP215
JPEG16
JP216
JPEG17
JP217
JPEG18
JP218
JPEG19
JP219
JPEG20
JP220
JPEG21
JP221
JPEG22
JP222
JPEG23
JP223
JPEG24
JP224
JPEG25
JP225
JPEG26
JP226
JPEG27
JP227
JPEG28
JP228
JPEG29
JP229
JPEG30
JP230
JPEG31
JP231
JPEG32
JP232
JPEG33
JP233
JPEG34
JP234
JPEG35
JP235
JPEG36
JP236
JPEG37
JP237
JPEG38
JP238
JPEG39
JP239
JPEG40
JP240
JPEG41
JP241
JPEG42
JP242
JPEG43
JP243
JPEG44
JP244
JPEG45
JP245
JPEG46
JP246
JPEG47
JP247
JPEG48
JP248
JPEG49
JP249
JPEG50
JP250
JPEG51
JP251
JPEG52
JP252
JPEG53
JP253
JPEG54
JP254
JPEG55
JP255
JPEG56
JP256
JPEG57
JP257
JPEG58
JP258
JPEG59
JP259
JP260
JPEG60
METS:fileSec
METS:fileGrp USE archive
METS:file GROUPID G1 TIF1 imagetiff CHECKSUM f38126891fcd062c2d5c40ff5d3a8c4b CHECKSUMTYPE MD5 SIZE 8433416
METS:FLocat LOCTYPE OTHERLOCTYPE SYSTEM xlink:href lakshmi_s_Page_01.tif
G2 TIF2 a1d32b1e7afc94ef44de476ccebd5af0 8432940
lakshmi_s_Page_02.tif
G3 TIF3 0caa22410331c9a34d334fae1d7d1d27 8432880
lakshmi_s_Page_03.tif
G4 TIF4 2fd0560e97ebbebb8ccfd11aeb904bce 8434324
lakshmi_s_Page_04.tif
G5 TIF5 7b94e949d7ac71ab3b3725c6ee924be8 8434260
lakshmi_s_Page_05.tif
G6 TIF6 2897fc7eca24f77cc51301f4509088a0 8434584
lakshmi_s_Page_06.tif
G7 TIF7 39883be4fe70b94cc34964ecf7a8d22b 8433788
lakshmi_s_Page_07.tif
G8 TIF8 47fe28513a51a6e6054789ce0285525f 8435208
lakshmi_s_Page_08.tif
G9 TIF9 a833404f5a8ea3b54b95dac853478662 8433764
lakshmi_s_Page_09.tif
G10 TIF10 e61bb12c859efd825d070c7d0fe214bd 8435624
lakshmi_s_Page_10.tif
G11 TIF11 213d088833bd42f2acbb07e1a445c9a1 8436412
lakshmi_s_Page_11.tif
G12 TIF12 d85337f3f3350227bc658881276983b4 8435672
lakshmi_s_Page_12.tif
G13 TIF13 e146e70826f175e05b0168d3e13fd297 8435200
lakshmi_s_Page_13.tif
G14 TIF14 6c10b61b1326f23596a4814cb7d23b0d 8436080
lakshmi_s_Page_14.tif
G15 TIF15 1784628e63f9ee839bcafe3352e9891c 8434724
lakshmi_s_Page_15.tif
G16 TIF16 f705e44d70656ef831ffedc87ad63e69 8435712
lakshmi_s_Page_16.tif
G17 TIF17 d4e1c568a6ec21bd8d109d40271cea0f 8435220
lakshmi_s_Page_17.tif
G18 TIF18 df1081de52f690df84cb3425ccd3960d 8436088
lakshmi_s_Page_18.tif
G19 TIF19 a69c80bf473551bf0ad57617a938a4f7 8435864
lakshmi_s_Page_19.tif
G20 TIF20 acc68ff4a95a4f2135df7faf548f00d5 8436148
lakshmi_s_Page_20.tif
G21 TIF21 7810245a13abb7713e33ad29d16a93f4 8436200
lakshmi_s_Page_21.tif
G22 TIF22 727b3b72614bd023c47eb941c40c84b4 8436112
lakshmi_s_Page_22.tif
G23 TIF23 198c9cb24f89b49f1026d8442a8c6b70 8436188
lakshmi_s_Page_23.tif
G24 TIF24 e64fceb68ea4f7cf42f8af87c3de9105 8436096
lakshmi_s_Page_24.tif
G25 TIF25 b9a460f29bc18d34848e1eaeab4ce3b6 8433616
lakshmi_s_Page_25.tif
G26 TIF26 5c2c8df2297e2dabfbe48b7bf50f2a98 8434996
lakshmi_s_Page_26.tif
G27 TIF27 dafcfe42c8b34798ab744adc1a00ad84 8435420
lakshmi_s_Page_27.tif
G28 TIF28 0152b552426451588cd6995c7296f4ab 8436144
lakshmi_s_Page_28.tif
G29 TIF29 811a1f720ed3758e9afac2ee73db527a 8435816
lakshmi_s_Page_29.tif
G30 TIF30 fead979f628135f3fc90132434e781b7 8435484
lakshmi_s_Page_30.tif
G31 TIF31 43bd2e9bc13eb807038b9b10bd8b3841
lakshmi_s_Page_31.tif
G32 TIF32 3c9f1618561be7560f5eb4ccc1a61257 8435148
lakshmi_s_Page_32.tif
G33 TIF33 ca77aa8714ee4cb2a003c3f713b72397 8435740
lakshmi_s_Page_33.tif
G34 TIF34 aaafe6c7a685e98978425abdd573b0fe 8435444
lakshmi_s_Page_34.tif
G35 TIF35 5936c774b99fe8eafd3a082c8703e8ea 8436020
lakshmi_s_Page_35.tif
G36 TIF36 c4aa5c0802baaa919f653dbfa4cf98cc 8435928
lakshmi_s_Page_36.tif
G37 TIF37 8f45566867a17a97a706b4614080bd17 8435728
lakshmi_s_Page_37.tif
G38 TIF38 9dafaab8489da143bde94e4843c28d8b 8435036
lakshmi_s_Page_38.tif
G39 TIF39 ad286dde13cc00c4dfd0562dc6cd0ae2 8435500
lakshmi_s_Page_39.tif
G40 TIF40 f16fd282e66ad389dc6fbf673615bd76 8434908
lakshmi_s_Page_40.tif
G41 TIF41 4bfeb96e777502774c7669628e6097bf 8435540
lakshmi_s_Page_41.tif
G42 TIF42 68d46ec5465a0a3470ef9436051b048e 8436192
lakshmi_s_Page_42.tif
G43 TIF43 b8e334ea240d20d1e32ead9678b2276f 8434872
lakshmi_s_Page_43.tif
G44 TIF44 ace301cd365a2cd4d33a8ee0ee1127eb 8436152
lakshmi_s_Page_44.tif
G45 TIF45 d5e2a49bbb6da63ea8971528f8b345ac 8436232
lakshmi_s_Page_45.tif
G46 TIF46 143b898f29135ee84c679dbc9ebf38ea 8435924
lakshmi_s_Page_46.tif
G47 TIF47 4269ad657d5b93145d41e67c85ae592a 8436072
lakshmi_s_Page_47.tif
G48 TIF48 82d9d4ed55d20dd65d969f39d7f7a793 8436240
lakshmi_s_Page_48.tif
G49 TIF49 1253ea0103de8009362cbf32ca6571a6
lakshmi_s_Page_49.tif
G50 TIF50 4bf1277eec79c242e5f90081b12e9c40 8436076
lakshmi_s_Page_50.tif
G51 TIF51 140592975ade37bc586f230ecba0b28c 8436124
lakshmi_s_Page_51.tif
G52 TIF52 da8a4be5ecf0e47f90dfce53549799ad 8436084
lakshmi_s_Page_52.tif
G53 TIF53 5ea854f653b13bf6f6fec96a60d08b69 8435828
lakshmi_s_Page_53.tif
G54 TIF54 2b2ef2cdebff296d513db97c4c5e1bb6
lakshmi_s_Page_54.tif
G55 TIF55 2bdf46deec57c85431ee3a3fdebafaed 8434548
lakshmi_s_Page_55.tif
G56 TIF56 f1d60f7790be8c46bf55ccc57df4f736 8435576
lakshmi_s_Page_56.tif
G57 TIF57 8aa555c6b1c71228e69263a218f59d26 8436000
lakshmi_s_Page_57.tif
G58 TIF58 89e1328c31f8690d68b7801ec28bde97 8433624
lakshmi_s_Page_58.tif
G59 TIF59 a0af0c3761fb1636181a735a819f7b60 8435452
lakshmi_s_Page_59.tif
G60 TIF60 387c7d5fc6f8562572f083d6acbb527d 8434464
lakshmi_s_Page_60.tif
reference
imagejp2 451912d39c4b69c36a6f442f6fc466ec 211959
lakshmi_s_Page_01.jp2
e211e9d45982034146b1c321d6d459d5 27593
lakshmi_s_Page_02.jp2
cbea07ac0e9f79bb8a37b3a92cfda61e 19833
lakshmi_s_Page_03.jp2
d9433238f4c54f340f0c03727269b84f 503081
lakshmi_s_Page_04.jp2
a722e61667ecb712308ca1942dc1852a 737142
lakshmi_s_Page_05.jp2
659112b891903a9f6f082777b096ab44 927247
lakshmi_s_Page_06.jp2
f70497a2ed4e307a26df4247620178a8 383070
lakshmi_s_Page_07.jp2
c834571ae1feec9cd8f9057d132d7e7d 764593
lakshmi_s_Page_08.jp2
7dba5379cdd333436f9172869df3793c 299162
lakshmi_s_Page_09.jp2
71c0d40613f266cf9e4e1167343987be 936508
lakshmi_s_Page_10.jp2
b330fd0883c08c018258748d693fd0e4 1033989
lakshmi_s_Page_11.jp2
673b859370fdf9ed5a9a2a32674105df 884599
lakshmi_s_Page_12.jp2
26a3774289d9b6d6a84af6cdcd68f30f 658914
lakshmi_s_Page_13.jp2
3264a09bac9452dc4fb56960cb453481 1037098
lakshmi_s_Page_14.jp2
a9e4faaa1c9df42e42a9c9ee7f9d0b11 588288
lakshmi_s_Page_15.jp2
3d30a5f951dbb66cf6a4f866ce94f091 937799
lakshmi_s_Page_16.jp2
bd8202d02163e052abbb3e918e230b96 775492
lakshmi_s_Page_17.jp2
fbe781372cdeef0a0c8f4c97104c5a10 1051984
lakshmi_s_Page_18.jp2
17226ee0c6d4d51cee71792ef08e4c2f 1035517
lakshmi_s_Page_19.jp2
6e6efb704be9c3a490c170c7750bd107 1047375
lakshmi_s_Page_20.jp2
0af59656390ecc54308e78d998acba0c 1051978
lakshmi_s_Page_21.jp2
a616defe5505025a3f3160a22c89044f 1051981
lakshmi_s_Page_22.jp2
729bb8b5cd1b8fc9f73126402fe36633 1051949
lakshmi_s_Page_23.jp2
f5ca181d57bd9850b6a2b233610197ba 1051979
lakshmi_s_Page_24.jp2
d360e9c5c7a82f47ffbf65ec8408493d 255068
lakshmi_s_Page_25.jp2
cc1629fb8b57cee7296a5c6e6a033632 728628
lakshmi_s_Page_26.jp2
134f16372b444f36e3ebb47397f679a6 836325
lakshmi_s_Page_27.jp2
9674928c5b76aa75108925d184d47c42 1043685
lakshmi_s_Page_28.jp2
f4081b6f9a8f87e4a8d2e8ddca077d65 964429
lakshmi_s_Page_29.jp2
755bda20cd2714cfd02596f44481c8ee 819448
lakshmi_s_Page_30.jp2
b8a19423f14553a9f5d69ee5808f1128 761882
lakshmi_s_Page_31.jp2
81d31057f4870532649b603c93798b2c 918949
lakshmi_s_Page_32.jp2
f505eaa6955a0102113652bb9ab6a909 910977
lakshmi_s_Page_33.jp2
efa6c8bc2f091342433d066fefc9114c 959399
lakshmi_s_Page_34.jp2
8408cd7c9db9f3e6d2770115b8001fba 958375
lakshmi_s_Page_35.jp2
918a7cd68fee03a622edc8136f21fe0b 977985
lakshmi_s_Page_36.jp2
fdb5c7bdccc7d99f78cf9d584b33a2c9 888948
lakshmi_s_Page_37.jp2
2ff34c38217c24c2e41e7ec3ffb4e432 715973
lakshmi_s_Page_38.jp2
599d6cc7db7d214deb731af1424b97cb 847816
lakshmi_s_Page_39.jp2
8c784a9f33ee9ca68ffc0d4ae006f989 698649
lakshmi_s_Page_40.jp2
977ac280f4475d7083d4925338250c24 763984
lakshmi_s_Page_41.jp2
5c084e784717fbb38001074d71d86a71
lakshmi_s_Page_42.jp2
42768d364d6543411dab60d66723f543 610531
lakshmi_s_Page_43.jp2
c9d8b55d6151d50a7a7acea674327bf3 1035412
lakshmi_s_Page_44.jp2
0e091588e912a39b19fbc9c66eaca069 1051906
lakshmi_s_Page_45.jp2
624236601158e8595fab52854b731852 948714
lakshmi_s_Page_46.jp2
e3f2505b7967c4253b3bb2e70ea2f6f2 1004137
lakshmi_s_Page_47.jp2
957c1054b3e372e490ebe3819683f798 1051938
lakshmi_s_Page_48.jp2
c6730957fc20791713ce796b24c9e909 947281
lakshmi_s_Page_49.jp2
20d91e94d24cf6947f3e81c268a818d5 1051952
lakshmi_s_Page_50.jp2
e11c7d64ecc09dd71e782af2327b1d42 982607
lakshmi_s_Page_51.jp2
b2b0f31cfeea641b4d99d7034b872049 1042071
lakshmi_s_Page_52.jp2
430537b5926676fe020f0d8b9c419c7a 940570
lakshmi_s_Page_53.jp2
596838bbb936a93fdbac1be4130c60ab 997181
lakshmi_s_Page_54.jp2
8f5d2525adc9885f5c1f164962db871b 539213
lakshmi_s_Page_55.jp2
d64fe113716b7a1924acbffc3e0f3c24 901161
lakshmi_s_Page_56.jp2
b26bb173560a0a41d1ec4d05a9fd5494 981736
lakshmi_s_Page_57.jp2
b6dada93aeb81f3414e62fd6e94bb5a8 254758
lakshmi_s_Page_58.jp2
4cb658d958881944d2881eeaaa391831 983380
lakshmi_s_Page_59.jp2
d1be800bf84d9fff58659b4b24566da2 509203
lakshmi_s_Page_60.jp2
imagejpeg 41f2b325b2b1b6795fe5cdc1f3d2b033 41173
lakshmi_s_Page_01.jpg
JPEG1.2 ab9e51409c9bbd4e707ddcb0bbbce9a4 25290
lakshmi_s_Page_01.QC.jpg
ecef0691e5abc30635285f541866b12e 22231
lakshmi_s_Page_02.jpg
JPEG2.2 5f36ab418ed5e19414524d3305783fac 19284
lakshmi_s_Page_02.QC.jpg
3aefe34e6b1d4861f8d2ef9f5a505330 21410
lakshmi_s_Page_03.jpg
JPEG3.2 8a3ff9a905bbb676348ead930b686158 18950
lakshmi_s_Page_03.QC.jpg
8ef1405d177c313aeaac42b66db36bd3 67349
lakshmi_s_Page_04.jpg
JPEG4.2 33c7ac631cd2868180c1bc9cc5dd0838 34923
lakshmi_s_Page_04.QC.jpg
a7d19d7a2ea036732121111ee1987229 94224
lakshmi_s_Page_05.jpg
JPEG5.2 ea14feb22166960abac020978a774680 35962
lakshmi_s_Page_05.QC.jpg
1d6568d6f30661d05a1c7d9784f44054 113182
lakshmi_s_Page_06.jpg
JPEG6.2 10af322de8fb6719b6421910a140802d 39426
lakshmi_s_Page_06.QC.jpg
a1aadb6cb038f379c9959ee152460ef2 57167
lakshmi_s_Page_07.jpg
JPEG7.2 32eccb069f9271d5dc47ce7251066927 29276
lakshmi_s_Page_07.QC.jpg
d810dc959c14dbedbe868a3d3a026473 92326
lakshmi_s_Page_08.jpg
JPEG8.2 7447e092409651d28549736721789011 42462
lakshmi_s_Page_08.QC.jpg
34956a7a80b73aa5b859a80c7df3f29d 47688
lakshmi_s_Page_09.jpg
JPEG9.2 f0083800b7369f9c4456f364c28bc415 28495
lakshmi_s_Page_09.QC.jpg
baa9f57c7a4c2d7748c520ff1d87c587 106088
lakshmi_s_Page_10.jpg
JPEG10.2 5563aaf9cd15ede4c968f848d00444a7 48116
lakshmi_s_Page_10.QC.jpg
2c7b2350f330dbfed6a087a3e022993c 117350
lakshmi_s_Page_11.jpg
JPEG11.2 9cd35685eb1183ddf984fc714bfaeb10 52310
lakshmi_s_Page_11.QC.jpg
bfc4e2fd2dfdbb8cc70dfce2d4780e3a 101913
lakshmi_s_Page_12.jpg
JPEG12.2 73b17ac6251727ac810bcaae34e8bc4d 46728
lakshmi_s_Page_12.QC.jpg
f0bc27cf9977ef625c5c59a64d18a444 76295
lakshmi_s_Page_13.jpg
JPEG13.2 1aedc580edca6db726c8dd8e58b1de70 39175
lakshmi_s_Page_13.QC.jpg
cb6809fb4f60b0449b70d4a2ffe46a91 115644
lakshmi_s_Page_14.jpg
JPEG14.2 25df179b37f60659883f9fd44a81c24f 52022
lakshmi_s_Page_14.QC.jpg
66dde352ea75aacbcf80221f68170c18 74488
lakshmi_s_Page_15.jpg
JPEG15.2 9078b869b5efc9a6a1192a3177ecaf62 38005
lakshmi_s_Page_15.QC.jpg
a8f7af07cf93f96957af981243a55f95 107043
lakshmi_s_Page_16.jpg
JPEG16.2 6797ca95fa1f1db697a4669c79ce04f8 48407
lakshmi_s_Page_16.QC.jpg
81f017998ccddf5816dc727a0a42ecba 87758
lakshmi_s_Page_17.jpg
JPEG17.2 522a5fec0b0610b4c6ebccf418769901 43238
lakshmi_s_Page_17.QC.jpg
62f6a9421fc201f4e01c7e2159644d98 119644
lakshmi_s_Page_18.jpg
JPEG18.2 4720115a83c675c3b23a99af7d0b7d0a 51663
lakshmi_s_Page_18.QC.jpg
d7d184954e1a068a43305cce0bce8e32 114725
lakshmi_s_Page_19.jpg
JPEG19.2 5f0960921d4006071599a5b7d97f1ff7 50537
lakshmi_s_Page_19.QC.jpg
420d453b432d3871281b2a016acfe164 117373
lakshmi_s_Page_20.jpg
JPEG20.2 dfe50d366fd97327f555392a6c6fc3ff 52671
lakshmi_s_Page_20.QC.jpg
764732fb96cbac5fa65fd4824fe1f790 125286
lakshmi_s_Page_21.jpg
JPEG21.2 ee1f163070d1df114df84fcc3326cf4b 52180
lakshmi_s_Page_21.QC.jpg
b1f94bd50cc733a6ddab00321f967bd8 122537
lakshmi_s_Page_22.jpg
JPEG22.2 49d194cfd01462c1791824630f17f27b 52503
lakshmi_s_Page_22.QC.jpg
51fd05a70273e5f5e512938c9238d2ce 121378
lakshmi_s_Page_23.jpg
JPEG23.2 22c323fa0921e19aaaed72af193d05c6 53197
lakshmi_s_Page_23.QC.jpg
f5cc3ed62f0e74112434e99aaccd0a85 118400
lakshmi_s_Page_24.jpg
JPEG24.2 e7db2ba4f0ef57318ad3d7467cef015a 52839
lakshmi_s_Page_24.QC.jpg
e4912c401cda90fe9bd24b1bc605af02 44915
lakshmi_s_Page_25.jpg
JPEG25.2 4dba254aa9d6e3a71fb5258c680dfbd5 27135
lakshmi_s_Page_25.QC.jpg
1f4485b4d6ec3517fbf477c17df85628 87506
lakshmi_s_Page_26.jpg
JPEG26.2 c3c35a5db36db6ed01a5b13cb634843d 42370
lakshmi_s_Page_26.QC.jpg
bc0d5f68cb20c3240581bcb0bee46cc5 97832
lakshmi_s_Page_27.jpg
JPEG27.2 b38d939ae8ebff1f63faa28fd2fa50cf 44791
lakshmi_s_Page_27.QC.jpg
60774bc8484e217d52e02dc803889dba 117276
lakshmi_s_Page_28.jpg
JPEG28.2 1bada56447d073c9fa4af9fd900d98b1 50121
lakshmi_s_Page_28.QC.jpg
bccadb68805a2710b801d218e2011e21 111776
lakshmi_s_Page_29.jpg
JPEG29.2 9404f0b3370b12e5f85671f9c4ad63d4 48699
lakshmi_s_Page_29.QC.jpg
245741474bafe93c29e0aa957272aee8 98883
lakshmi_s_Page_30.jpg
JPEG30.2 e021a70b153afca29a4d437b5f2b742c 44688
lakshmi_s_Page_30.QC.jpg
8e3ecab138b52daf2ab1bc6d84254875 84504
lakshmi_s_Page_31.jpg
JPEG31.2 e5faa570d65c7d68d33137a695a946fa 40554
lakshmi_s_Page_31.QC.jpg
3b6f352cecbfa7258eecb81a3d05b453 88700
lakshmi_s_Page_32.jpg
JPEG32.2 670907fde251ff046d7d41367fc4ad74 40664
lakshmi_s_Page_32.QC.jpg
9353d28707b3d28bd8da34a52a92dabe 105169
lakshmi_s_Page_33.jpg
JPEG33.2 454ef79397261cc1c7782616ea32ee16 47931
lakshmi_s_Page_33.QC.jpg
e7d2c65b38aa90a8a82e641e5dec74db 99709
lakshmi_s_Page_34.jpg
JPEG34.2 eb3593f4b47a7881ddf6d7334c6ae779 44081
lakshmi_s_Page_34.QC.jpg
a8e01b044985f806ec89b7c40b31be99 100904
lakshmi_s_Page_35.jpg
JPEG35.2 4b90a0528214d1941138930f133ff41f 46869
lakshmi_s_Page_35.QC.jpg
2be2d5464dbcda0a0f9ea44a7d55cba5 102887
lakshmi_s_Page_36.jpg
JPEG36.2 96351d4cbd93d767e8da04721b7489c0 46631
lakshmi_s_Page_36.QC.jpg
3c6e32d5a5025bb74cde0c06c394acf3 102219
lakshmi_s_Page_37.jpg
JPEG37.2 62ed7aa7dbbc70857a40cfd5015c2f68 48163
lakshmi_s_Page_37.QC.jpg
18b1c994aaac1f702b525d0080556bf1 80915
lakshmi_s_Page_38.jpg
JPEG38.2 2db1fb881f05f27a5afbed4c09819f48 40047
lakshmi_s_Page_38.QC.jpg
7299d8c98468af32d38574a0ce3f4294 98143
lakshmi_s_Page_39.jpg
JPEG39.2 ad4cd5cb5ecc7b481c0095a5b7c8d571 44874
lakshmi_s_Page_39.QC.jpg
8e73ba76905a31eff978826b1ead873a 79657
lakshmi_s_Page_40.jpg
JPEG40.2 cc1dac0ba6ebc8c38a6923af2355ef53 40293
lakshmi_s_Page_40.QC.jpg
4add8323fe804b36a38dbc3f992065f5 92382
lakshmi_s_Page_41.jpg
JPEG41.2 43e49c342b96c2c1291051878974fe3b 44580
lakshmi_s_Page_41.QC.jpg
e8fa571764b1521eb65c7045eb3a39e0 140349
lakshmi_s_Page_42.jpg
JPEG42.2 58f721739fce5b2038590affd7d49bb4 54538
lakshmi_s_Page_42.QC.jpg
1abc21ad8a66ebde79a483ae3055afc7 70266
lakshmi_s_Page_43.jpg
JPEG43.2 bf6002c086cd5355af796d2960560445 37197
lakshmi_s_Page_43.QC.jpg
0e0f2b57d5d3b037e3411bcb1ffed634 115031
lakshmi_s_Page_44.jpg
JPEG44.2 e6b638b4627eaa22ff1b4eeb68902703 50327
lakshmi_s_Page_44.QC.jpg
e585ce39e2cd5374937c506170757061 121912
lakshmi_s_Page_45.jpg
JPEG45.2 1f98b5b53639095bc43560bd2a2d8a41 53845
lakshmi_s_Page_45.QC.jpg
461eba6a0e7fbcee44e97d6903fd137e 109205
lakshmi_s_Page_46.jpg
JPEG46.2 c4ac34fef921716bde57f6186f9eb37a 49403
lakshmi_s_Page_46.QC.jpg
60cee0e8a69c61fc072843b7a5adca55 114152
lakshmi_s_Page_47.jpg
JPEG47.2 70c8665417bcd47b59abc18c2df92131 52037
lakshmi_s_Page_47.QC.jpg
4636e4d6892f3fbce0e6d5085f87cc22 121659
lakshmi_s_Page_48.jpg
JPEG48.2 67ba20594bedc0310acf77dffe585e54 55127
lakshmi_s_Page_48.QC.jpg
5940acaeddaecf169448c52cccb3165f 107701
lakshmi_s_Page_49.jpg
JPEG49.2 f7e15d048c734fff965b42d5ae3e0a4d 48669
lakshmi_s_Page_49.QC.jpg
4ae5221924c47b1deb9b1cc1ea2f950e 120815
lakshmi_s_Page_50.jpg
JPEG50.2 e83e6993bc7e5bc9e63f8f32b7eb5010 53415
lakshmi_s_Page_50.QC.jpg
8be04870aa71aae585c06d9a91cec19d 110327
lakshmi_s_Page_51.jpg
JPEG51.2 63abcaff4f781ea313d400c2b351f727 50098
lakshmi_s_Page_51.QC.jpg
cb32da922678ead83fd59b21f03c1148 117898
lakshmi_s_Page_52.jpg
JPEG52.2 db732de294822651d8fc449d9ce7703b 52955
lakshmi_s_Page_52.QC.jpg
f17f22dc867f250e2fbb78decd8ffd24 108379
lakshmi_s_Page_53.jpg
JPEG53.2 88391c92826c43e79c6a8c6913445b1e 48609
lakshmi_s_Page_53.QC.jpg
b2c28f0458a3f87ec24df91410e51d64 113131
lakshmi_s_Page_54.jpg
JPEG54.2 347e7a22cb47c20450f63d89ff13c89b 51394
lakshmi_s_Page_54.QC.jpg
043c7d5bb1d390cf8d74e984cc87265f 71169
lakshmi_s_Page_55.jpg
JPEG55.2 52a10adbd46fd94ba3a80bbb4cc1369a 36078
lakshmi_s_Page_55.QC.jpg
d078150dcc9f2ba1eeb6bfe6a738740d 103386
lakshmi_s_Page_56.jpg
JPEG56.2 f6a246ae218799a186573fe06a2cdf2a 47200
lakshmi_s_Page_56.QC.jpg
c48291db221bf6bc55bea30c9f59f87f 109770
lakshmi_s_Page_57.jpg
JPEG57.2 b32c515a085cb44e8d1cb3bb86da8241 51335
lakshmi_s_Page_57.QC.jpg
f048ea04837e94ecbdf31fc02ff10737 44783
lakshmi_s_Page_58.jpg
JPEG58.2 eee6fbe9b4bfbf453d51e86b65140a71 26918
lakshmi_s_Page_58.QC.jpg
d10c9039f1dff8066b338f31350e4701 109691
lakshmi_s_Page_59.jpg
JPEG59.2 dc861bf88f3bf1970019c165457f84d8 45847
lakshmi_s_Page_59.QC.jpg
d40cd4d80722a322885e013907fc4119 68012
lakshmi_s_Page_60.jpg
JPEG60.2 0b04840b1d240b14e8867ab88aba95cb 35451
lakshmi_s_Page_60.QC.jpg
THUMB1 imagejpeg-thumbnails 9c8b3cad1a7b464004cf56377d304315 20077
lakshmi_s_Page_01thm.jpg
THUMB2 a139d93f77a5ea345c219367581d98c2 18250
lakshmi_s_Page_02thm.jpg
THUMB3 9432a1840ed210a195ca83196437adca 18082
lakshmi_s_Page_03thm.jpg
THUMB4 c048ab70d99568252631d7aa9f5acf27 23278
lakshmi_s_Page_04thm.jpg
THUMB5 a3a4f6effd18500b478efef49cb9a7db 23129
lakshmi_s_Page_05thm.jpg
THUMB6 bd1313091be9e58f527f0b22aaa5aec2 24154
lakshmi_s_Page_06thm.jpg
THUMB7 23df72dce8bc6129b832679bdbd47c56 21398
lakshmi_s_Page_07thm.jpg
THUMB8 df2e4eab860d9feb7a03507e1e17e5bf 26083
lakshmi_s_Page_08thm.jpg
THUMB9 95ddc807c6cf5b0b1d7254cadea9d494 20987
lakshmi_s_Page_09thm.jpg
THUMB10 5bfcb46b3e0777d87e4c0257bd9f93a2 27336
lakshmi_s_Page_10thm.jpg
THUMB11 05ecba101b51b818d5b565c1a8c481ec 28945
lakshmi_s_Page_11thm.jpg
THUMB12 8bb23bea996d6526b28e44dd5616af8b 27354
lakshmi_s_Page_12thm.jpg
THUMB13 28ab2d301068a53cd0847d08eda14514 25609
lakshmi_s_Page_13thm.jpg
THUMB14 bc30ebf90cb3ef4072133f88bea8850e 28556
lakshmi_s_Page_14thm.jpg
THUMB15 8b3237f4b661650179eeec27b00814cb 24072
lakshmi_s_Page_15thm.jpg
THUMB16 e6e2c2518ab15d53bbbf71e66e8178a9 27611
lakshmi_s_Page_16thm.jpg
THUMB17 b66493f38523a152eac020fb3be4a292 26163
lakshmi_s_Page_17thm.jpg
THUMB18 06f2271a36fc7889eb76fa03f64c15e3 28190
lakshmi_s_Page_18thm.jpg
THUMB19 4f78b1f86758551076a5353d4d5cce06 28264
lakshmi_s_Page_19thm.jpg
THUMB20 9f04d86cff55f0c078710e0053736d34 28686
lakshmi_s_Page_20thm.jpg
THUMB21 ab2f4d539454d1dd6c9f71cf587d03b0 28602
lakshmi_s_Page_21thm.jpg
THUMB22 85d39500d745d6e2d4144996fa5e96b8 28919
lakshmi_s_Page_22thm.jpg
THUMB23 060a7c88bb613ff557884e502a50fd90 28997
lakshmi_s_Page_23thm.jpg
THUMB24 c16564ede1b8aa5e93cfe9544330ca51 28927
lakshmi_s_Page_24thm.jpg
THUMB25 f0f9b8d53e524777146199ed1bd92399 20467
lakshmi_s_Page_25thm.jpg
THUMB26 01f673ccfc94e9c47ecb06ed2b596720 25196
lakshmi_s_Page_26thm.jpg
THUMB27 a15edc300392538b3b583bd701f41eb0 26645
lakshmi_s_Page_27thm.jpg
THUMB28 144a47a6cfd4cce6744e75c5f36b713c 28873
lakshmi_s_Page_28thm.jpg
THUMB29 937b01a39319772e96ff6425f60a3bc9 27849
lakshmi_s_Page_29thm.jpg
THUMB30 910592d717ec8820324d001a7cac4fcc 26802
lakshmi_s_Page_30thm.jpg
THUMB31 3babd4718fea324f2e48e24aa511bd7d 25433
lakshmi_s_Page_31thm.jpg
THUMB32 f4ab63709a2e2c5e66c6b9533264b624 25595
lakshmi_s_Page_32thm.jpg
THUMB33 70e594b30dd87b39328373d78d3c4eb1 27574
lakshmi_s_Page_33thm.jpg
THUMB34 1335547ca2873c541f2eb3189fc92471 26467
lakshmi_s_Page_34thm.jpg
THUMB35 e6d6d525d334d8756baabf5701c072f9 27777
lakshmi_s_Page_35thm.jpg
THUMB36 de4bf61db8a02bd46315a7df94bda434 27947
lakshmi_s_Page_36thm.jpg
THUMB37 7acaa71ef3a33b4ac2559eb68a951cb9 27411
lakshmi_s_Page_37thm.jpg
THUMB38 dd27513d46a340e1023808de131d70de 25163
lakshmi_s_Page_38thm.jpg
THUMB39 19ed369ca98b9ff4f0bd35e35446cc6f 26925
lakshmi_s_Page_39thm.jpg
THUMB40 20ddcee9aad70c3d4bb24ac449d76140 25172
lakshmi_s_Page_40thm.jpg
THUMB41 1cb2d80de1703ef5337efb1686bb5cdf 26730
lakshmi_s_Page_41thm.jpg
THUMB42 185cac11e3b63bf13fac7b550fbb8797 28784
lakshmi_s_Page_42thm.jpg
THUMB43 5b15cf40b53e9d1b0475a5eef62c0b77 24964
lakshmi_s_Page_43thm.jpg
THUMB44 54467b306bd755975774403ef27e8096 28439
lakshmi_s_Page_44thm.jpg
THUMB45 204505f467e1297f66684472db4ab284 29340
lakshmi_s_Page_45thm.jpg
THUMB46 5bb0f1d9b12de5276c9bec882a966807 27828
lakshmi_s_Page_46thm.jpg
THUMB47 d4b0766a140f68939a6ba491fd63dae0 28534
lakshmi_s_Page_47thm.jpg
THUMB48 1c5844738570b220f73d6b1b7df0f393 29187
lakshmi_s_Page_48thm.jpg
THUMB49 67047e7c23df5dfe60bbc640e992e1f3 28136
lakshmi_s_Page_49thm.jpg
THUMB50 314c4b20274f1173d73a8c908c0d1663 28701
lakshmi_s_Page_50thm.jpg
THUMB51 7a79bf65df945fc030f1b41deb7c3072 28331
lakshmi_s_Page_51thm.jpg
THUMB52 f17e2941a1a854731cad150d87cee16d 28866
lakshmi_s_Page_52thm.jpg
THUMB53 18ee69be89fa62e5617c543718427cd9 28161
lakshmi_s_Page_53thm.jpg
THUMB54 998b29d607a87ce422a5df9cd0202876 28512
lakshmi_s_Page_54thm.jpg
THUMB55 2909787eb53e25c022cfef10cace1221 24032
lakshmi_s_Page_55thm.jpg
THUMB56 9234e0d5a22a6df45d11db05558a97e4 27080
lakshmi_s_Page_56thm.jpg
THUMB57 7eaead863f924c65408ce7670b2fd754 28515
lakshmi_s_Page_57thm.jpg
THUMB58 a4544310c49587da6ce6a96aba3cb2e3 20420
lakshmi_s_Page_58thm.jpg
THUMB59 42de5cdd6bac027e3a5b4ebdeb2e9fed 26492
lakshmi_s_Page_59thm.jpg
THUMB60 548c06232b92c3a5d5f327998272f192 23500
lakshmi_s_Page_60thm.jpg
TXT1 textplain 264ec5b20bb875958e47fc3571948cfe 393
lakshmi_s_Page_01.txt
TXT2 72a19694b156a46830440482768dbf43 99
lakshmi_s_Page_02.txt
TXT3 d8543357c5953afed3399d6e09e78018 76
lakshmi_s_Page_03.txt
TXT4 19a74b088efbd3995e4e0444bd8cdeaa 905
lakshmi_s_Page_04.txt
TXT5 0219f51780739eb57ff7980f8589f0ae 2074
lakshmi_s_Page_05.txt
TXT6 88f3acc24e0fd1e81c1047442ef57401 2618
lakshmi_s_Page_06.txt
TXT7 aaac0dde10959d493a6b511c9c128389 916
lakshmi_s_Page_07.txt
TXT8 371b3aa51dea3b56c05653ebf828709d 1445
lakshmi_s_Page_08.txt
TXT9 ebb1c053a533b038a94a24c4510bdbc8 521
lakshmi_s_Page_09.txt
TXT10 69107b5d702eea00c2d38e4e7162c727 1733
lakshmi_s_Page_10.txt
TXT11 df1cc3f463c9e2944db2d9e6aa6ae58d 1834
lakshmi_s_Page_11.txt
TXT12 118b0f769184203954ed305f7594d370 1601
lakshmi_s_Page_12.txt
TXT13 e1525e1273df07f37de7dbbae78c39d8 847
lakshmi_s_Page_13.txt
TXT14 3b6fb625b833257ce9b9d6540495d709 1848
lakshmi_s_Page_14.txt
TXT15 5c68cdfc4e939dd47d09833aa629c8a6 1047
lakshmi_s_Page_15.txt
TXT16 50d54049ac70fef73102f43d44647779 1691
lakshmi_s_Page_16.txt
TXT17 609b12ec084de1f80be0e9e56610cd6d 1264
lakshmi_s_Page_17.txt
TXT18 d5567423154964229556fb55e10fe7b6 2042
lakshmi_s_Page_18.txt
TXT19 6fbac42c40bf07b07983b6bb80ed550e 1837
lakshmi_s_Page_19.txt
TXT20 894605ecaa21a38a890dce2cc1f37f55 1783
lakshmi_s_Page_20.txt
TXT21 8f56ce796dd12438a56c3b7115fbcac0 2172
lakshmi_s_Page_21.txt
TXT22 e7f1c9b8b3bb0e97feac66ad8ee3960a 2092
lakshmi_s_Page_22.txt
TXT23 1e3cad599f0f948cc746aefa3cd2d34e 1892
lakshmi_s_Page_23.txt
TXT24 9b2ea7099a20ba3e4d25acf2fe28073a 1817
lakshmi_s_Page_24.txt
TXT25 643d8c91550b62ecfdf60ebc9865fce7 491
lakshmi_s_Page_25.txt
TXT26 c4703e5fe6ce7f800805d3e47cf0016d 1341
lakshmi_s_Page_26.txt
TXT27 c270a6082a6c4dd528ffa0699248e43e 801
lakshmi_s_Page_27.txt
TXT28 61863d7af39a879addab8e76f489c3d8 1962
lakshmi_s_Page_28.txt
TXT29 8ac04aebec198ab247cad6d6331c7da2 1787
lakshmi_s_Page_29.txt
TXT30 e0efdfdc9f88d71d1d5ff60ab626496c 1478
lakshmi_s_Page_30.txt
TXT31 075124a6e21c34890891b24417b8197f 1480
lakshmi_s_Page_31.txt
TXT32 6c4c2987f22558ce38882384afc8a6ff 1890
lakshmi_s_Page_32.txt
TXT33 10e2acb2dc454f4c9c9bcded0113cded 1649
lakshmi_s_Page_33.txt
TXT34 a71c44f88b98ddf7b34a819e591b9c50 2023
lakshmi_s_Page_34.txt
TXT35 49720015119a505b7ddc369a53b96a71 1144
lakshmi_s_Page_35.txt
TXT36 5256143d8bf23f8961dd14e01bed4ce4 1453
lakshmi_s_Page_36.txt
TXT37 96807cd7956d3eb65338cb09557b3536 1557
lakshmi_s_Page_37.txt
TXT38 93c86467e487cbf88d95d1259b8c2ebf 1377
lakshmi_s_Page_38.txt
TXT39 95cb86bb8331e9004e682613a4e166d9 1564
lakshmi_s_Page_39.txt
TXT40 863accc33082620ef14f502922cc3d73 1117
lakshmi_s_Page_40.txt
TXT41 4d7f22253a5e6213cb3002efd013a48f 1467
lakshmi_s_Page_41.txt
TXT42 4547fc6ac071ee3ea5bffdc1489cdd81 2546
lakshmi_s_Page_42.txt
TXT43 36a7aaee2549ca378e942a17594af36f 982
lakshmi_s_Page_43.txt
TXT44 2d166099016012fd0acd6816e9ba2c3b 1940
lakshmi_s_Page_44.txt
TXT45 0fba9d29e60dc4aae060728ab480e325 1953
lakshmi_s_Page_45.txt
TXT46 c34022859157473ed8175ae637a7f4f7 1737
lakshmi_s_Page_46.txt
TXT47 d5a4b503cb1f18897b1cc617aa7f7d3b 1811
lakshmi_s_Page_47.txt
TXT48 13e5d5b2ada417d58d882d0d5a4a1619
lakshmi_s_Page_48.txt
TXT49 129fd60b615674984ba3a7260a692ee1 1722
lakshmi_s_Page_49.txt
TXT50 608cde00c8f1c64d96ea22d9b017897a 1893
lakshmi_s_Page_50.txt
TXT51 f1801646e66b02403cfdd61de819cb06 1754
lakshmi_s_Page_51.txt
TXT52 5322549e2f29f8fb359163b963cd8f86 1870
lakshmi_s_Page_52.txt
TXT53 f71b45d29d78ca59c9b1917a85e32c50 1864
lakshmi_s_Page_53.txt
TXT54 6e5360f029fb4e25dde66bc8af721d8d 1794
lakshmi_s_Page_54.txt
TXT55 470f5038e3567c0d9f3a1ee5c48eb475 1013
lakshmi_s_Page_55.txt
TXT56 7b30775af3bb150e205362970eaa8b19 1641
lakshmi_s_Page_56.txt
TXT57 b1d8de5f95ca9573c4e4578d3b0b6a63 1753
lakshmi_s_Page_57.txt
TXT58 406afdd95e4279956370dc7b0f09db16 520
lakshmi_s_Page_58.txt
TXT59 4b5942b14c819e8d77291f4182daff80
lakshmi_s_Page_59.txt
TXT60 57c74015b3cc7bf1e6b26815f75944c2 912
lakshmi_s_Page_60.txt
PDF1 applicationpdf d57327a28134be2f1fd097a27be3736b 1233195
lakshmi_s.pdf
METS2 unknownx-mets df1a171ec32b70c62b47b84ed81aecf8 70020
UFE0000559_00001.mets
METS:structMap STRUCT1 physical
METS:div DMDID ADMID Web-based ORDER 0 main
PDIV1 1 Main
PAGE1 Page i
METS:fptr FILEID
PAGE2 ii 2
PAGE3 iii 3
PAGE4 iv 4
PAGE5 v 5
PAGE6 vi 6
PAGE7 vii 7
PAGE8 viii 8
PAGE9 ix 9
PAGE10 10
PAGE11 11
PAGE12 12
PAGE13 13
PAGE14 14
PAGE15 15
PAGE16 16
PAGE17 17
PAGE18 18
PAGE19 19
PAGE20 20
PAGE21 21
PAGE22 22
PAGE23 23
PAGE24 24
PAGE25 25
PAGE26 26
PAGE27 27
PAGE28 28
PAGE29 29
PAGE30 30
PAGE31 31
PAGE32 32
PAGE33 33
PAGE34 34
PAGE35 35
PAGE36 36
PAGE37 37
PAGE38 38
PAGE39 39
PAGE40 40
PAGE41 41
PAGE42 42
PAGE43 43
PAGE44 44
PAGE45 45
PAGE46 46
PAGE47 47
PAGE48 48
PAGE49 49
PAGE50 50
PAGE51 51
PAGE52 52
PAGE53 53
PAGE54 54
PAGE55 55
PAGE56 56
PAGE57 57
PAGE58 58
PAGE59 59
PAGE60 60
STRUCT2 other
ODIV1
FILES1
FILES2



PAGE 1

WEB-BASED SEARCH ENGINE FOR RADIOLOGY TEACHING FILE By SHRIRAM LAKSHMI A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2002

PAGE 2

Copyright 2002 by Shriram Lakshmi

PAGE 3

To My Wonderful Family

PAGE 4

ACKNOWLEDGMENTS I would like to thank Dr. Doug Dankel for encouraging me to start this work and providing timely guidance and help. I would also like to thank Dr. Chris Sistrom for spending a lot of time in reviewing our work and providing excellent feedback and suggestions right from the beginning. I also thank Dr. Joe Wilson for being in my committee and giving suggestions from time to time. I would like to thank Ms. Aarathi Balakrishnan for her valuable criticism and suggestions on my work. I also thank Mr. Balaji Krishnaprasad for providing me with the necessary resources for completing this thesis. I would also like to thank all my friends who have helped me in so many ways throughout my graduate study. Finally, I would like to thank my family for all their love, encouragement and support throughout my life. iv

PAGE 5

TABLE OF CONTENTS page ACKNOWLEDGMENTS.................................................................................................iv LIST OF FIGURES..........................................................................................................vii ABSTRACT.....................................................................................................................viii CHAPTER 1 INTRODUCTION...........................................................................................................1 1.1 Problem Domain......................................................................................................2 1.2 System Design.........................................................................................................3 1.3 Design Advantages..................................................................................................5 1.4 Document Structure.................................................................................................6 2 BACKGROUND AND TECHNOLOGY OVERVIEW.................................................7 2.1 Existing RTF System...............................................................................................7 2.2 Technologies Used...................................................................................................9 2.2.1 Relational Databases....................................................................................10 2.2.2 Structured Query Language.........................................................................10 2.2.3 Stored Procedures........................................................................................11 2.2.4 Triggers........................................................................................................12 2.2.5 Active Server Pages.....................................................................................13 2.2.6 VBScript.......................................................................................................14 2.2.7 Open-Data-Base-Connectivity (ODBC)......................................................14 2.3 Summary and What Is Next...................................................................................16 3 SYSTEM FEATURES AND DESIGN.........................................................................17 3.1 Main Database Design...........................................................................................17 3.1.1 TF Table.......................................................................................................18 3.1.2 Images Table................................................................................................19 3.1.3 Other Tables.................................................................................................20 3.1.3.1 Contributors........................................................................................20 3.1.3.2 Patient_Info.........................................................................................21 3.1.3.3 Modality..............................................................................................21 v

PAGE 6

3.1.3.4 Rating..................................................................................................21 3.1.3.5 Hospital_Info......................................................................................21 3.2 WebSE Walkthrough.............................................................................................22 3.2.1 Free-Text Search..........................................................................................22 3.2.2 Advanced Search..........................................................................................28 3.3 Summary and What Is Next...................................................................................29 4 INSIDE WebSE.............................................................................................................30 4.1 Indexing System.....................................................................................................32 4.1.1 WordList......................................................................................................32 4.1.2 WordInfo......................................................................................................33 4.1.3 ColumnInfo..................................................................................................35 4.1.4 ImageCount..................................................................................................35 4.2 Query Processing: How It Works..........................................................................36 4.2.1 Traditional System.......................................................................................37 4.2.2 WebSE.........................................................................................................38 4.2.2.1 Single term search...............................................................................38 4.2.2.2 Multiple term search...........................................................................39 4.3 Ranking..................................................................................................................40 4.3.1 Frequency of Occurrence.............................................................................41 4.3.2 Column of Occurrence.................................................................................41 4.3.3 Adjacency of Words.....................................................................................42 4.3.4 Weighted Words..........................................................................................42 4.4 Implementation of Ranking...................................................................................42 4.4.1 Ranking in Single Word Searches...............................................................42 4.4.2 Ranking in Multiple Word Searches............................................................43 4.4.2.1 Additional strategy..............................................................................43 4.4.2.2 Implementation...................................................................................45 4.4 Summary and What Is Next...................................................................................46 5 CONCLUSIONS AND FUTURE WORK....................................................................47 5.1 WebSE...................................................................................................................47 5.2 Extensions..............................................................................................................48 5.2.1 Synonyms.....................................................................................................48 5.2.2 Spelling Mistakes.........................................................................................48 5.3 Summary................................................................................................................49 LIST OF REFERENCES...................................................................................................50 BIOGRAPHICAL SKETCH.............................................................................................51 vi

PAGE 7

LIST OF FIGURES Figure page 1-1 High Level Design........................................................................................................4 2-1 Existing RTF Architecture............................................................................................8 3-1 Main Database Design................................................................................................18 3-2 Basic Search Screen....................................................................................................22 3-3 Search Results Screen.................................................................................................23 3-4 Case Details Screen....................................................................................................25 3-5 View Images Screen...................................................................................................26 3-6 Another View Images Screen.....................................................................................27 3-7 Advanced Search Screen............................................................................................29 4-1 The 3-Layer Index Architecture.................................................................................31 4-2 Index Tables and Relationships..................................................................................32 4-3 RTF and Index Tables................................................................................................34 vii

PAGE 8

Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science WEB-BASED SEARCH ENGINE FOR RADIOLOGY TEACHING FILE By Shriram Lakshmi December 2002 Chair: Dr. Douglas D. Dankel II Major Department: Computer and Information Science and Engineering Teaching in any field can be successful only with the help of good illustrations. This is especially true in the medical field, where even an excellent professor can sound vague without good examples. As a result, the academic medical community started collecting together illustrative images of cases in a database. This database is known as the Radiology Teaching File or simply RTF and has proven to be a great asset to academic radiology departments. However, as the size of the database increases, maintaining the database and searching the database for cases with a particular characteristic become time-consuming and cumbersome. Hence, there is a need for a system that can provide an easy to use interface to maintain the database and also a retrieval system that can fetch the relevant cases. The focus of this thesis is to provide an easy-to-use search engine that can search the RTF efficiently. viii

PAGE 9

This search engine, called WebSE (Web-based Search Engine), was developed with usability, versatility, and scalability in mind. The user can search the database of cases using an easy-to-use general search, like Google (the web search engine), or the user can use a more complicated interface to run a more precise search. In the former, a relevancy ranking mechanism helps the user to see the most important cases first. As a result, this tool helps in making teaching and learning more effective. ix

PAGE 10

1 CHAPTER 1 INTRODUCTION The academic medical community has always felt the need for a database of cases for illustration. A description of any anomaly is never complete without a few practical case reviews. These case reviews are a must not only for making the students better understand the pathologic conditi on at hand, but also for servi ng to expose the student to some real world cases before the student begins to actually diagnose real patients. A case usually includes a few X-rays, and/or other forms of medical images, of the affected part of the patient. There is no substitute for th ese images while teaching. The statement A picture speaks a thousand words is especially true here, since medical images are complex and the students attenti on needs to be driven to the right details. To make teaching effective and complete every medical faculty member needs a database of cases illustrating each pathologic condition. This database of illustrative cases is called the Radiology Teaching File, referr ed to as RTF from this point on. The RTF can be used to store not only cases that are a perfect exampl e of a condition but also cases that are exceptions to the general behavior. This makes the new generation of physicians aware of those exceptions and mo re prepared to handle them. Considering that physicians constantly add cases to the RTF and search it frequently for cases of a particular kind, it is imperative to have a system that can make this process easy and quick. This leads to bett er use of the physicians time and also leads to overall effectiveness of teaching.

PAGE 11

2 1.1 Problem Domain The RTF is an ever-growing database, with a wide variety of cases. Each case has its own set of images. For example, the RTF at the Department of Radiology, University of Florida, has around twenty thousand patient cases with each case having six images on average. In addition, new cases are added on a daily basis. This makes maintaining all the image files and the database of cases cumbersome. Additionally, the size of the RTF places a burden on the physicians when they attempt to locate a case with a particular characteristic. Traditionally, cases of interest were selected from a list of all available cases, which could be viewed sequentially or searched based on special codes assigned to the cases, like the ACR code. 1 Obviously, this involves examining many unrelated cases before finding the right case. In such a scenario, the RTF seems to create more problems than it solves. The RTF can prove useful only if the physicians find it easy to add cases, maintain the database, and retrieve cases quickly [Dugas et al. 2001] and efficiently. The user-interface of the system should not require extensive computer knowledge, since its primary users are physicians. At the same time, the system should be quick, efficient, and require little or no maintenance. This thesis looks closely at the problem of case retrieval, while Ms. Aarathi Balakrishnan examines the user-interface and maintenance of the system in her thesis Design and Analysis of User-Interface for RTF [Balakrishnan 2002]. The case retrieval system is referred to as WebSE, standing for Web-based Search Engine, from this point on. 1 Members of the American College of Radiology evolved the ACR codes to code images in a teaching film library.

PAGE 12

3 The Microsoft Access versio of the RTF, currently used at the University of Florida, has a limited search capability. Though this system provides an easy-to-use interface, it is greatly inhibited by the limitations in the underlying software. WebSE aims to eliminate the shortcomings of this system by using a new design while at the same time providing a user-friendly interface. 1.2 System Design WebSE was developed with the above requirements in mind. It is a database-driven, web-based search engine. The two main components of the overall design are the Data Server and the Web Server. Figure 1.1 illustrates the overall system design. First, let us look at the two components, examining how they work together. The data server consists of two databases. The main database consists of all the patient cases and is called the RTF. In addition to this database, there is another database that is added to speed up query processing in the RTF. This is called the pseudo-index database, which contains meta-data about the data in RTF. The data server stores all the data using SQL Server. The web server accepts queries from the World Wide Web sent to it from client browsers. A set of scripts, written in VB Script, resides on this web server. These scripts are also called Active Server Pages or ASP. Clients send their requests through a browser to the web server. The scripts on the web server interact with the data server and respond to the client by sending the search results back to the client.

PAGE 13

4 Figure 1-1 High Level Design Let us examine how the two components work together at run-time. Firstly, the user inputs his query on the browser interface on his computer. This query (which is usually a set of keywords) is sent over the World Wide Web to the web server. When the web server receives a request from a client, it initiates an instance of the search script and executes it. This script, which runs totally on the web server, first establishes a connection with the data server, running SQL Server, using an ODBC (Open Data Base Connectivity) connection. Then the script analyses the query sent by the user and generates a SQL query.

PAGE 14

5 The script sends this generated SQL query to the database server through the ODBC connection. The instance of the SQL Server, running on the server, accepts this query from the script, executes the query, and returns the result through the ODBC connection to the ASP script. The ASP script formats the result into HTML (Hyper Text Markup Language), which is sent over the World Wide Web back to the client browser initiating the request. The browser on the clients machine renders the HTML source code and presents it to the user in the proper format. 1.3 Design Advantages There are a number of design advantages to this architecture. First, the user-interface offers a whole array of options to the user for searching. The user can enter a few keywords like in Google or can use an advanced search to search by image modality, age, or any other possible field in the database. The user is also provided with a special code search through which the user can build an ACR code [ACR 94] and search the RTF using that code. This ensures versatility of the search engine. Second, the user-interface is designed by modeling the existing system. This eases the transition for the physicians from the old system to WebSE. For the additional components in this design, like the keyword search page, the user-interface is modeled after existing popular interfaces, like the interface presented by the Google search engine. This ensures user-friendliness of WebSE. Third, when the user is presented with the results, the results are sorted in order of decreasing relevance. In this way, the user receives the most relevant case on top of the list. This can be a big time-saver for the user when the database is really large or when the retrieved result set has many cases.

PAGE 15

6 This system is also scalable. The use of pseudo-indexing greatly reduces the processing time involved in searching the database. In addition to making the search quicker, it also makes the search less dependent on the size of the database. This ensures that system performance does not degrade as the size of RTF increases. Finally, since this WebSE is a web-based system, to take advantage of the RTF, users only need to have a browser installed on their client systems. Also, the client can run the browser on any platform. 1.4 Document Structure The following chapters look at the design and performance of WebSE and its interface. Chapter 2 discusses the existing teaching file system at the University of Florida and the technologies needed to develop WebSE. Chapter 3 looks at WebSE from the user-perspective, examining the user-interface. Chapter 4 provides an in-depth view of the architecture of WebSE. Chapter 5 presents conclusions and proposes future extensions.

PAGE 16

CHAPTER 2 BACKGROUND AND TECHNOLOGY OVERVIEW The existing implementation of the RTF system in the Department of Radiology at University of Florida has a reasonably good user-interface. Unfortunately, the underlying technology used, inhibits its capabilities to a great extent. This chapter looks at the design of this system and examines its pros and cons. Then, it examines the software tools used for implementing WebSE. 2.1 Existing RTF System The existing implementation of the teaching file uses a Microsoft Access database. The user-interface was also implemented in Microsoft Access, with the help of Microsoft Access forms. The user-interface was basic and simple with most of the functionalities provided. Since many people wanted access to this well-designed teaching file it was expanded to include sharing. Before looking at the architecture of this design, we need to understand linking as defined in Microsoft Access Help. In an Access database, linking data enables you to read and in most cases, update data in the external data source without importing. The external data source's format is not altered so that you can continue to use the file with the program that originally created it, but you can add, delete, or edit its data by using Microsoft Access as well. You can also link tables from other Microsoft Access databases. For example, you might want to use a table from another Microsoft Access database that is shared on a network [Microsoft 2000b]. This is particularly useful if you want to store all of your tables in one 7

PAGE 17

8 database on a network server, while keeping forms, reports, and other objects in a separate database that is copied among users of the shared database. This concept of linking was used here. The actual data was stored in a separate database. This database had only the tables of the teaching file. The forms and other user-interface related material was placed in another database, which is referenced as the UI database from this point on. This UI database linked to the main database to access the data. Now, the data was present in one central location, and the UI was distributed among all the clients. Figure 2-1 Existing RTF Architecture The following are some of the distinct advantages of this architecture. The UI access database is one small file, which can be distributed among the users of the teaching file easily. Since the data is in one central location, the problem of duplication is eliminated. In other words, everyone works on the same data, even though they are accessing it from different systems.

PAGE 18

9 Any change made to the data by any user (additions, deletions, or updates), will be seen by all the other users immediately. This proved to be an excellent, initial implementation of the teaching file. However, this design is not free from drawbacks, the following are some of its disadvantages. The most significant of these is that Microsoft Access does not support a large number of simultaneous users, thereby limiting the number of people that can access the database at one time. Every person who wants to access the teaching file must have Microsoft Access installed on their system, which also means that the user must be running Windows. Updating the user-interface is very cumbersome, because it involves updating all the copies of the client database. This may even be impossible in some cases. Another drawback is that, Microsoft Access is intended for small databases only. Beyond a certain size, its performance significantly degrades. With the size of the teaching file growing all the time, this would become a problem eventually. With more users wanting to access the teaching file, a system that could support more concurrent users and a system that could be platform independent needed to be developed. Also, the system needed to be scalable and to retain the user-friendly features of the current system. 2.2 Technologies Used This section describes the various technologies used in WebSE. In WebSE, the main data resides on Microsoft SQL Server, also known as Sequel Server [Microsoft 2000c]. The following are some of the reasons why SQL Server was chosen for WebSE. It is an industrial strength database management system, which can handle far bigger databases than Microsoft Access. It provides a more robust access to multiple users at the same time. It supports stored procedures and triggers, which make database maintenance easier.

PAGE 19

10 It also has support for handling permissions and security. One can set permissions for an individual column or for the whole database. This is especially important because, some tables have confidential patient information, which need to be kept private. RTF stores the data in a relational format. SQL Server supports Relational Databases. 2.2.1 Relational Databases Although there are different ways to organize data in a database, relational databases are one of the most effective. Relational database systems are an application of mathematical set theory to the problem of effectively organizing data. In a relational database, data is collected into tables (called relations in relational theory) [Elmasri and Navathe 2001]. A table represents some class of objects that are important to an organization. For example, a company may have a database with a table for employees, another table for customers, and another for stores. Each table is built of columns and rows. Each column represents some attribute of the object represented by the table. Each row, or tuple, represents an instance of the object represented by the table. When organizing data into tables, you can usually find many different ways to define tables. Relational database theory also defines a process called normalization, which ensures that the set of tables you define organizes your data effectively. 2.2.2 Structured Query Language To work with data in a database, you have to use a set of commands and statements (language) defined by the DBMS software. Several different languages can be used with relational databases; the most common is SQL (Structured Query Language) [Elmasri and Navathe 2001]. The American National Standards Institute (ANSI) and the

PAGE 20

11 International Standards Organization (ISO) define software standards, including standards for the SQL language [Microsoft 2000c]. SQL Server 2000, used for WebSE, supports the Entry Level of SQL-92, the SQL standard published by ANSI and ISO in 1992 [Microsoft 2000c]. The dialect of SQL supported by Microsoft SQL Server is called Transact-SQL (T-SQL) [Microsoft 2000c]. Note that SQL Server is the name of the database system sold by Microsoft and is very different from SQL, which is a database access language standard. The two terms should not be confused with each other. In addition to the above-mentioned features, WebSE also uses a couple of other features provided by SQL Server. They are Stored Procedures and Triggers. A brief description of these follows. 2.2.3 Stored Procedures When you create an application with SQL Server, the Transact-SQL programming language is the primary programming interface between your applications and the SQL Server database. When you use Transact-SQL programs, two methods are available for storing and executing the programs. You can store the programs locally and create applications that send the commands to SQL Server and process the results, or you can store the programs as stored procedures in SQL Server and create applications that execute the stored procedures and process the results [Microsoft 2000c]. Stored procedures in SQL Server are similar to procedures in other programming languages in that they can Accept input parameters and return multiple values in the form of output parameters to the calling procedure or batch. Contain programming statements that perform operations in the database, including calling other procedures.

PAGE 21

12 Return a status value to a calling procedure or batch to indicate success or failure (and the reason for failure). The benefits of using stored procedures in SQL Server rather than Transact-SQL programs stored locally on client computers are They allow modular programming. You create a procedure once, store it in the database, and call it any number of times in your program. They allow faster execution. If the operation requires a large amount of Transact-SQL code or is performed repetitively, stored procedures can be faster than batches of Transact-SQL code. They are parsed and optimized when they are created, and an in-memory version of the procedure can be used after the procedure is executed the first time. Transact-SQL statements are repeatedly sent from the client each time they run and are compiled and optimized every time SQL Server executes them. They can reduce network traffic. An operation requiring hundreds of lines of Transact-SQL code can be performed through a single statement that executes the code in a procedure, rather than by sending hundreds of lines of code over the network. They can be used as a security mechanism. Users can be granted permission to execute a stored procedure even if they do not have permission to execute the procedure's statements directly. 2.2.4 Triggers SQL Server provides two primary mechanisms for enforcing business rules and data integrity: constraints and triggers. A trigger is a special type of stored procedure that automatically takes effect when the data in a specified table is modified [Microsoft 2000c, Elmasri and Navathe 2001]. A trigger is invoked in response to an INSERT, UPDATE, or DELETE statement. A trigger can query other tables and can include complex Transact-SQL statements. The trigger and the statement that fires it are treated as a single transaction, which can be rolled back from within the trigger. If a severe error is detected (for example, insufficient disk space), the entire transaction automatically rolls back.

PAGE 22

13 Triggers are useful in the following ways Triggers can cascade changes through related tables in the database; however, these changes can be executed more efficiently using cascading referential integrity constraints. Triggers can enforce restrictions that are more complex than those defined with CHECK constraints. Unlike CHECK constraints, triggers can reference columns in other tables. For example, a trigger can use a SELECT from another table to compare to the inserted or updated data and to perform additional actions, such as modify the data or display a user-defined error message. Triggers can also evaluate the state of a table before and after a data modification and take action(s) based on that difference. Multiple triggers of the same type (INSERT, UPDATE, or DELETE) on a table allow multiple, different actions to take place in response to the same modification statement. 2.2.5 Active Server Pages The Web server component of WebSE supports Active Server Pages (ASP). The server-side scripts are written in VBScript [Microsoft 2000a]. A more detailed description of server side scripts and ASP follows. Microsoft Active Server Pages (ASP) is a server-side scripting environment that you can use to create interactive Web pages and build powerful Web applications [Microsoft 2002]. When the server receives a request for an ASP file, it processes server-side scripts contained in the file to build the Web page that is sent to the browser. In addition to server-side scripts, ASP files can contain HTML (including related client-side scripts) as well as calls to COM components that perform a variety of tasks, such as connecting to a database or processing business logic. A server-side script starts execution when a browser requests a .asp file from the Web server. The Web server then calls ASP, which processes the requested file from top to bottom, executes any script commands, and sends a Web page to the browser.

PAGE 23

14 Because the scripts run on the server rather than on the client, the Web server does all the work involved in generating the HTML pages sent to browsers. Server-side scripts cannot be readily copied because only the result of the script is returned to the browser. Users cannot view the script commands that created the page they are viewing. ASP supports various scripting languages including VBScript, which is used by WebSE. 2.2.6 VBScript Microsoft Visual Basic Scripting Edition, the newest member of the Visual Basic family of programming languages [Microsoft 2000a], brings active scripting to a wide variety of environments, including Web client scripting in Microsoft Internet Explorer and Web server scripting in Microsoft Internet Information Server. It has all the good features of Visual Basic making it easy to use and learn. At the same time, it blends very well with all the other components of WebSE. VBScript talks to host applications using ActiveX Scripting. With ActiveX Scripting, browsers and other host applications do not require special integration code for each scripting component. ActiveX Scripting enables a host to compile scripts, obtain and call entry points, and manage the namespace available to the developer. ActiveX Scripting is used in Internet Explorer and in IIS. VBScript and ActiveX Scripting can also be used as a general scripting language in other applications. 2.2.7 Open-Data-Base-Connectivity (ODBC) The scripts interact with the Data Server over an ODBC connection. ODBC is a standard definition of an application-programming interface (API) used to access data in relational or indexed sequential access method (ISAM) databases [Microsoft 2001]. SQL Server supports ODBC as one of the native APIs for writing C, C++, and Visual Basic applications that communicate with SQL Server.

PAGE 24

15 ODBC enables a database to become an integral part of an application. SQL statements can be incorporated into the application, allowing the application to retrieve and update values from a database. Values from the database can be placed in program variables for manipulation by the application. Conversely, values in program variables can be written to the database. ODBC enables applications to access a variety of data sources, including a wide range of relational databases and local ISAM data. The ODBC driver for SQL Server is included with SQL Server. ODBC defines a call-level interface, or CLI. A CLI is defined as a set of function calls and their associated parameters. A CLI definition uses a native programming language to call functions; therefore, a CLI requires no extensions to the underlying programming language. This contrasts with an embedded API, such as Embedded SQL, where the API is defined as an extension of the source code for a programming language, and applications using the API must be precompiled in a separate step. ODBC aligns with the following specification and standard for relational SQL database CLI definitions: The X/Open CAE specification Data Management: SQL Call-Level Interface (CLI) [Microsoft 2001] and ISO/IEC 9075-3:1995 (E) Call-Level Interface (SQL/CLI) [Microsoft 2001]. SQL Server programs that are written using the ODBC API communicate with SQL Server through C function calls. The SQL Server-specific versions of the ODBC functions are implemented in a SQL Server ODBC driver. The driver passes SQL statements to SQL Server and returns the results of the statements to the application. ODBC applications are also interoperable with drivers for heterogeneous data sources.

PAGE 25

16 2.3 Summary and What Is Next This chapter examined the existing RTF system and its advantages and disadvantages. It discussed the reasons for choosing SQL Server over Access and examined the various other technologies like Relational Database, Stored Procedures, SQL, Triggers, IIS, ASP, VBScript, and ODBC. The next chapter looks at WebSE from a user point of view and also includes a walkthrough.

PAGE 26

CHAPTER 3 SYSTEM FEATURES AND DESIGN The previous chapters presented a basic design overview of WebSE. This chapter takes a close look at WebSE from the user perspective. The actual inner design for improving performance and scalability and also the implementation details are left for the next chapter. Here, we first examine the underlying database design to better understand the domain of WebSE. Then, we take a screen-by-screen walk through and analyze the functionality offered by WebSE. The User-interface is also scrutinized at every point. 3.1 Main Database Design For any database driven project to be successful, the underlying database design needs to be sound. Needless to say, this was the first phase of implementing this thesis, to analyze the tables and normalize the database design. Figure 3.1 captures the essence of the database design. It shows the main database tables and the relationships existing between them. A brief description of the significance of the main tables and their main fields follows. Note that this is a description of the data tables alone and does not include the pseudo-index tables that are added to the database for improving scalability and performance. 17

PAGE 27

18 Figure 3-1 Main Database Design 3.1.1 TF Table The TF table, or Teaching File, is the main table that holds all the main data related to the cases, excluding image related information and other information like patient information, hospital information, etc. One tuple exists in this table for every different case. This table includes: CASE_NUMBER: This is a unique number assigned to every case in the teaching file. In database terms this is referred to as the primary key of this table. Patient_id: This is a unique number assigned to every patient. Every patient can be uniquely identified with this id. The patient related information is stored in another table.

PAGE 28

19 AGE: This is the age of the patient in days. Since patients can include infants, whose age will be in days, the age is stored in days. For other patients, to get the age in years, we can always divide the age in days with 365. SEX_MALE: This is the gender of the patient. For males, a value of one is assigned and a zero indicates a female patient. ANATOMIC_CODE: This is an anatomical code assigned to every case that is an ACR code developed by the American College of Radiology. PATHOLOGIC_CODE: This is also part of the ACR code. It represents the Pathology. DIAGNOSIS: This field has the final diagnosis of the case. The physician provides this value when adding the case to the teaching file. CHIEF_COMPLAINT: In other words, Symptoms, or the patients initial complaint. COMMENT: This field holds any other comments that the physician may want to add about the case. CONTRIBUTOR: The id of the physician who entered this case. This is stored because it is sometimes useful to know which physician entered this case into the teaching file. IMAG_ANON: Holds the image-anonymous status. Rating_id: This is an optional field that the physician can enter, which identifies how good an example this case is, on a scale of 1 to 10. HXNO: This is the id of the hospital in which this case occurred. The actual hospital details are stored in another table. PRIVATE: This is a flag that sets a case as private or public. Only the contributor of the case can view a private case. 3.1.2 Images Table Every case can have more than one associated image. This table stores the information related to all the images for all the cases. The case_number has a foreign key constraint and references the TF tables case_number. The fields of this table are as follows

PAGE 29

20 CASE_NUMBER: This is the unique id of the case. MODALITY: This is the modality of the image. Every medical image has a modality associated with it. Examples of image modality are X-ray, MRI, CTScan, etc. IMAGE_ANNOT: This is optional information that can be added to any image to describe something about the image. ORIG_IMAGE_LOC: This is the full network path of the image before it was added to the teaching file. IMAGE_DIR: It should be noted that in the teaching file, the images are all stored in a separate set of directories. The database only holds the location to the images and not the images themselves. This is the directory in which the image is presently stored. IMAGE_FILE_NAME: The current file name of the image file. IMAGE_NUMBER: Since a given case can have more than one image, the images should be ordered among themselves to preserve the order of the images. This field contains this information. The above two tables are the most commonly searched tables in the database, since most users are interested in searching for cases and their images. In addition to this main information, other information is stored in a different set of tables. These tables are rarely searched and only by those who have the required privileges. A brief description of these tables follows. 3.1.3 Other Tables 3.1.3.1 Contributors This table holds information about all the contributors. Contributors are physicians with permissions to add to, delete from, and modify cases in the RTF. It contains information regarding their login, password, name, and the type of permissions they have to access the teaching file.

PAGE 30

21 3.1.3.2 Patient_Info This table holds patient information and is not open for public viewing. It should be noted that only the cases are open to the public, the patient information is to be kept confidential at all times. The contents of this table can be viewed only by the administrator or by contributors with appropriate permissions. 3.1.3.3 Modality This is a lookup table storing all the image modalities and their abbreviation. Medical images can be of different types and the type is called its modality. Examples of image modality are X-Ray, CT Scan, etc. 3.1.3.4 Rating This lookup table stores rating descriptions for all possible ratings in the teaching file. A rating can be given to every case inserted into the RTF. It describes the quality of the case some cases are very good illustrations of an anomaly and some are not that good. Having one lookup table for the ratings makes it possible for all the physicians adding cases to the RTF, to use the same scale for rating. 3.1.3.5 Hospital_Info As the name suggests, this table holds hospital information. Right now, only the hospital name is present in this table, as only that was thought necessary, but this table can be extended to hold additional information about hospitals. In addition to the above tables, there are a number of tables holding temporary information and lookup information. The tables mentioned above form the RTF.

PAGE 31

22 3.2 WebSE Walkthrough WebSE offers a wide range of search options to the user from very simple free-text search to a more complex advanced search. Let us look at each of type of search and examine its features. 3.2.1 Free-Text Search Whenever a user wants to search for a case, the first item that comes to his/her mind is the name of the anomaly itself or a set of keywords describing this anomaly. The free-text search offers the user the ability to search the database of cases with these keywords, providing the user with easy and effective communication with the search engine. The user-interface is very similar to Google. This style is used because almost everyone is familiar with this interface, making it ideal for the medical community. Figure 3-2 Basic Search Screen

PAGE 32

23 Figure 3.2 shows the first screen of WebSE. The user enters a set of keywords separated by spaces in the provided text box, then clicks on the search button or presses the enter key. This submits the keywords to RankedSearch1.asp on the web server. Figure 3-3 Search Results Screen

PAGE 33

24 The script in the asp file first establishes a connection with the database using an ODBC connection, then dynamically constructs a SQL query with the keywords submitted by the user. This query is sent over the ODBC bridge to SQL Server. SQL Server executes the query and returns to the asp file the results as a set of records. The results are wrapped in HTML and sent to the users browser. Figure 3.3 shows the screen displaying the search results. In the figure, the user has submitted the phrase cell carcinoma. As can be seen in the search results, all the cases listed at the top contain the search phrase in the diagnosis field. This is because the search results have been sorted for relevance. We examine the strategies used to rank the cases based on relevance later in this chapter. The actual implementation of the strategy is explained in the next chapter. There are three views that show the results to the user, during free-text search. These are Search Results View, Case Details View, and Case Images View. The user can see the search results in the first view. This screen does not give all the details about all the cases. Only a few fields of each case are shown. If the user wants the complete details about a case, he can browse through the list and click on View Case Details. This takes him to the second screen, which shows all the textual information available, about that particular case. The user can also click on Total Images to view the images one by one. This takes him to a screen where he can browse through the images for that case.

PAGE 34

25 Figure 3-4 Case Details Screen As can be seen in Figure 3.4, the view case details page displays all the information shown on the search page. In addition to this information, details like comments are also shown here. The user is provided with two options: he can either view the images of this case by clicking on Total Images or he can go back to the search page by clicking on Back to Search Page to look at the other cases. Figure 3.5 shows the browse images view. Usually, a case contains more than one image. When a user wants to see all the images related to a case, he is taken to the screen shown in this figure. Here, the first image of the case is shown. Below the image, four buttons are provided for navigation through the images.

PAGE 35

26 Figure 3-5 View Images Screen Note that in Figure 3.5, the first and previous buttons are disabled. This is because, the first image for this case is displayed, so it is logical to disable them. In Figure 3.6, the next and last buttons are disabled showing that it is the last image of the case. When showing images that are neither first, nor last, all of the buttons are enabled. This ensures that unnecessary confusions and problems do not arise while browsing through the images. Again, at any point the user has the option of going back to the search results view or to the case details view. This is true for all the three views the user can, at any time switch from one view to the other. This not only makes browsing through the set of cases easy, it also makes the process of browsing efficient.

PAGE 36

27 Figure 3-6 Another View Images Screen The search results view is customizable by the administrator. When the search results are shown, the script first checks the value of a display parameter for each field in the RTF. Only when this parameter is true for a field is it shown in the results. These parameters can be set or unset by the administrator, thereby changing the default behavior of the search results view. Additionally, some users might want to use the RTF to test their skills. To do this, they would like to view all the details of a case except the diagnosis. This way, they can try to determine what the diagnosis might be and then see if their answer matches the one in the RTF. To facilitate this, a link is provided at the bottom of the page that allows the user to search without the diagnosis. This offers the same user-interface, except that the

PAGE 37

28 diagnosis field is not shown until the user selects it [Harvard Medical School 2001, University of Washington 2000]. 3.2.2 Advanced Search Sometimes, the user may want to add other criteria into the search. For example, in addition to entering a search phrase, the user might want to limit his search results to cases involving only infants. Obviously, this cannot be done with the general search. For such queries, an additional search page called the advanced search is provided. Figure 3.7 shows the advanced search screen. Here, in addition to specifying keywords for the search, the user can specify an ACR code, Image Modality, Age range, Sex, and Contributor. None of these fields are mandatory. The user can enter values into any combination of fields and obtain search results. For example, the user can search for all cases contributed by a contributor A, where the patients sex is female. Or, the user might enter a search phrase heart cancer and mention an age range for male patients. This helps in finding very specific cases quickly leading to better use of the information in the RTF. The user can also search by ACR code. An ACR code builder is also provided to help the user build an ACR code for the type of cases for which he is looking. Here, the search results are shown using the same display scripts used by the basic search. Searching without diagnosis is also provided. By using a uniform style for displaying search results, WebSE eases the understanding the user-interface.

PAGE 38

29 Figure 3-7 Advanced Search Screen 3.3 Summary and What Is Next We first examined the database structure of RTF. Then, the two types of search capabilities were examined. The basic search provides a quick and easy way of finding the information while the advanced search helps the user to specify more complicated queries. Search results are displayed in decreasing order of relevancy helping the user look at the most relevant cases first. Also, a uniform and user-friendly interface helps the physician to concentrate on cases, rather than the search tool itself. In the next chapter we examine the interior design of WebSE.

PAGE 39

CHAPTER 4 INSIDE WEBSE All the features of WebSE shown in the previous chapters are not of much use if the search itself takes a long time to execute. The goal of this thesis was to design a search engine that could search through a database of cases efficiently and present the results in a user-friendly manner. The main tables to be searched include the TF table and the images table. The main columns that concern us are diagnosis, chief_complaint, and comment. All these columns contain more than one word, (i.e., there may be many phrases or sentences within each field) where each phrase or sentence contains more than one word. To search for keywords within a field, a normal like % query is used. This type of search capability is provided by most database systems today. Using this feature of the database system might seem to be a very simple solution and is fairly easy to implement. However, the like % query has a big disadvantage in that it is quite slow for larger databases. This is especially undesirable since the teaching file is an ever-growing collection of cases. Hence, a different search strategy has been implemented in WebSE. The search is made much faster if we index every individual word in a field. Since database systems do not generally support this type of indexing, the problem is reduced to indexing parts of a field, using a database system that only supports indexing of whole fields. 30

PAGE 40

31 To accomplish this, a set of special tables is created. These tables hold metadata about the data in the RTF tables. These tables, which will be referred to as pseudo-index tables from now on, act as one layer of the index. These tables are again indexed using SQL Servers index support, which forms the first layer of index. Above all, the main data tables are indexed on the primary key, which acts as the third layer of index. Figure 4.1 shows this 3-layer index architecture. Figure 4-1 The 3-Layer Index Architecture This 3-level indexing expedites the search process greatly on systems that do not support full-text indexing. Now, whenever the user submits a query, the whole indexing

PAGE 41

32 system can be used to directly fetch the relevant cases from the TF table in optimal time. Now, we examine the index tables and the whole system at runtime. 4.1 Indexing System There are 3 main tables that compose the pseudo-index, which is the second layer in the 3-layer indexing system. The Figure 4.2 shows the main tables and the relationships that exist between them. A brief description of each table, its significance, and its columns follow. Figure 4-2 Index Tables and Relationships 4.1.1 WordList This table tracks all the words occurring in the database. It keeps information relating to a word and is independent of the case in which the word occurs. For example, let us say that the word heart occurs many times in the database across many cases. Then, there will be only one entry for the word heart in this table. The information about each individual occurrence of the word heart is stored in another table. The following are the columns in this table.

PAGE 42

33 word: This field has a list of all the words in the database. If a word occurs in any of the indexed fields, in any record, in the database, then it occurs in this field. There is a clustered index built on this column. Clustered indices are supported by most of the commercially available database systems, including SQL Server. When an index is present in a column of a table, an index structure is maintained by SQL Server to point directly to the data page in which the word resides. Also, since the index is a clustered index, the data is physically sorted and arranged in that order in the data pages. A search for any one word on this index table requires only two I/O operations. One for fetching the index page and the other for fetching the data page. word_id: This is a unique id given to every word in this table. This is used when storing extra information about a word in another table. count: This tracks the number of times a word has occurred in the database. The count signifies the importance of the word. Very important words tend to occur far less often than common words and hence will have lower counts. weight: A weight is assigned to each word based on the number of times it occurs in the database. Commonly used words like of and the have a very low weight assigned to them. On the other hand, less frequently used words like carcinoma have a larger assigned weight. 4.1.2 WordInfo This table tracks each instance or occurrence of every word in the wordlist table. For every word in the wordlist table, there are one or more tuples in the WordInfo table. word_id: This is the unique identifier assigned to every word in the wordlist table. case_number: The case number uniquely identifies any given case in the teaching file database. Here, it stores the case_number of the case in which the word in the word_id field exists. column_id: This is a unique identifier given to every column in the teaching file that is being indexed. Here, it represents the column in which the occurrence of the word exists. position: The position field stores the position of a word in a column in a given case. Within a given field, the first word is assigned a position value of and subsequent words get position values incremented by . This field is useful when searching for an exact phrase match.

PAGE 43

34 It must be noted that this table has no primary key. This enables multiple occurrences of words within a case and within a column. Each occurrence of the word has its own entry in this table. Figure 4-3 RTF and Index Tables

PAGE 44

35 Figure 4.3 shows the index tables along with the main tables of RTF and the index built on each table. 4.1.3 ColumnInfo This table stores the list of columns in the TF database that need to be indexed. This list of columns is stored in a separate table for increased flexibility and customizability. Column_id: This is an auto-number generated by the database. It serves as a unique identifier for a column. Column_name: This is the actual name of the column as it occurs in the design of the TF table. This is used by the index populating scripts to obtain the name of the column to be indexed. Column_weight: This is a weight assigned to each column, by the administrator. This weight of the column plays an important role in ranking the query results. The weight of a column signifies its importance compared to other columns in the table. For example, assigning a weight of to Diagnosis and a weight of to Comments means that the diagnosis field is twice as important as the Comments field. 4.1.4 ImageCount ImageCount is an additional table storing pre-calculated image statistics about every case. It is not an index table, but just serves as an extra table to speed up querying. It stores the number of types of images of each modality for every case. This statistical information is to be given to the user when displaying search results. However, calculating these values at runtime slows down query processing. Hence, the values are pre-calculated and stored in this table. At runtime, the values can be obtained from here. Note that, initially, the index tables are populated based on the data in the RTF at that time. However, as updates are made to RTF, the index tables also need to reflect these changes. Keeping the index tables updated is very important since all the searching

PAGE 45

36 is done through these tables. Otherwise, this could lead to cases that are never shown in the search results or non-existent cases appearing in the results. In WebSE, this is done with the help of triggers. Whenever a new case is added to the RTF, triggers add the information about the words in the new case to the index tables. Similarly, when a case is deleted, the related information is deleted from the index tables. When an update is made to a case, all the information related to that case is deleted from the index tables and then updated data is re-inserted. We have seen the various components of the indexing system. Let us now see how the 3-layer indexing system performs during run-time. 4.2 Query Processing: How It Works When a user submits a search phrase, the web server creates an instance of the search engine script. Then, the search phrase entered by the user is split into individual words by assuming space as the delimiter. The script stores all the individual words in an array. Each word is scanned for (wildcard). Depending on the whether wildcard characters are found or not, a normal SQL query or a wildcard SQL query is built. This query is submitted to the Database engine. Every database engine develops a query execution plan before actually executing a query [Stonebraker and Hellerstein 1998]. An execution plan is nothing but a sequence of low-level operations that the database engine needs to perform to execute the query. The plan takes into consideration the indexes present on all the tables involved and also the sizes of the tables involved and optimizes the plan for time taken to execute the query. Here also, once the SQL Server receives a query from the web script, it analyzes the query and tries to formulate a plan. Due to the existence of the index system, the database engine executes the query in a particular way.

PAGE 46

37 We now examine how the query processor executes the query and see why it is quicker than a traditional execution plan. Note that there may be some slight variations in the actual processing path taken by the query processor of SQL Server from what is mentioned below [Stonebraker and Hellerstein 1998]. However, the main steps taken remain the same. We first look at the execution path for a traditional like % query and then look at our system. 4.2.1 Traditional System Let us examine the following SQL query, analyzing its run-time execution. Select from TF where diagnosis like %heart% or chief_complaint like %heart% or comment like %heart% The above query looks for all cases that have the word heart mentioned somewhere within the diagnosis, chief_complaint, or comment fields. Note that, % is the wildcard character in SQL. So, the job of the database engine now is to find all the cases that have the word heart mentioned anywhere in the given fields. But, the database engine does not know which cases in the TF table have the word occurrence. So, the only way to execute this query will be to look at each and every case in the TF table. To do this, it has to fetch every data page that contains any record of the TF table and search the relevant fields for the word heart. Since every page-fetch involves I/O, which is very expensive in terms of time, the whole process can be considerably slow. As the size of the database increases, the total number of I/O operations required to execute a query, also increases linearly. Let us now compare this with what happens in WebSE.

PAGE 47

38 4.2.2 WebSE Query processing in WebSE is performed in two parts, first we look at queries where the user only has one search term and then we look at multiple-term queries. 4.2.2.1 Single term search Let us look at the same example used previously. To search for all the cases with the word heart, the search engine presents a different SQL query to the database engine. This query makes use of the 3-layer indexing system and offers some speedup to query processing. The query is as follows Select from TF where case_number in (Select case_number from wordInfo where word_id in (Select word_id from wordlist where word=heart)). Query execution always starts with the innermost query. So, the first step in the execution plan is to search the wordlist table for the word heart. If the word is present in the wordlist table, then it exists in the TF database. Otherwise, it can be safely assumed that the word does not exist in the TF too. Since there is an index on this tables word field, SQL Server maintains a separate index structure that has pointers to the exact location of every record in its data page, based on the word. SQL Server uses this index structure implicitly for fetching the right tuple from the wordlist table. So, the data page of the wordlist table that has the word heart in it is retrieved. Now, the word_id from this record can be read. The execution now proceeds to the next step and executes the next query in the hierarchy. The goal of this step is to fetch all the records in WordInfo table that have the word_id selected from the previous step. Again, there is a clustered index on the word_id field here. So, the database system knows which data pages need to be retrieved and only

PAGE 48

39 those are fetched. Also, since the index is clustered, all records with the same word_id are stored close to each other, making the number of pages to be fetched, minimal. The case_numbers are selected from this set of records. Now the outer most query needs to be executed. The database engine needs to fetch all cases that have the given case_numbers. Since the TF table also has an index on the case_number field, the database engine retrieves the required data pages into memory. Thus, the required cases are obtained from the TF without having to search the entire TF table. Of course, there has been an overhead of two extra I/O operations to fetch the index table pages. But considering that TF is a very large table and fetching all the pages into memory takes a significant amount of time, the time saved is much greater. Note that the same logic applies to fetching the image information from the images table. The other advantage of this design is that, as the size of the database grows, the time to execute the query does not increase linearly (as with the traditional approach). The total I/O done decides the time taken to execute the query. In the traditional approach, as the size of the database increases, a search would involve more page fetches, making the execution time increase linearly with the size of the database. Here, no matter what the size of the database, only the required pages will be fetched. However, this does not mean that the execution time is totally independent of the database size. All the pages that have cases from the result set will be fetched anyway and so execution time depends on the number of cases in the final search results. 4.2.2.2 Multiple term search If there is more than one word in the search string, then the search process is done as a simple extension of the single-term search. First, a search is performed for all the

PAGE 49

40 individual words, on the wordlist table; similar to the way it is done in a single-term search, which results in a list of word_ids. These word_ids are used to get a list of case_numbers for each word_id. A different list results for each of the word_ids. Every list represents the list of cases having the word with the given word_id somewhere in some column of the TF table. Now, these lists are merged together resulting in a final list of case_numbers, which is used to retrieve the cases from the TF. Again, the rules of indexes still hold, making all searches optimal. WebSE also supports queries with wildcards. In a search phrase, the user can add a wildcard before or after any word. For example, the user can search for cardio* *nomy. This searches for all words starting with cardio and all words ending with nomy. The steps involved in executing wildcard queries are exactly the same. It should be noted that the basic idea used in the search strategy above is that any case having at least one of the words from the search phrase, in some column or the other, is relevant. This is a well-established technique that is the basis of all the web-search engines and information retrieval systems. In addition to searching the TF table efficiently, WebSE also ranks the search results based on relevancy. We now examine how ranking of cases is performed in WebSE. 4.3 Ranking Ranking is the process of ordering the search results in decreasing order of relevancy [Eylen 1997]. There are many ways of computing relevancy of a case for a particular search phrase. The following techniques are used within WebSE.

PAGE 50

41 4.3.1 Frequency of Occurrence Given a search word, a case that has more occurrences of that word in its record is assumed to be more relevant than a case having fewer occurrences of the same word. All major search engines on the web use this principle of word frequency for ranking their results. This statistical approach to ranking in a combination with other strategies has proven to be extremely successful. 4.3.2 Column of Occurrence In addition to using the frequency of occurrence, search engines like Altavista [Eylen 1997] and Google consider the location of the word as well. Documents having the search word in a heading get more weight than documents having the word at the bottom. This idea has been employed within our indexing system. However, the direct application of this principle does not make much sense in our indexing system. For example, the name of a disease might occur either in the bottom or the top of the description of the disease (in any column). Obviously, giving more weight to a case where a disease name occurs closer to the beginning is really not significant. However, consider the following example. Let us say that a user enters a search term, which is the name of a disease. Suppose there are three cases that have the disease name mentioned. Each case has the word occurring in a different column. It would be a good idea to give cases with the word occurring in one column more importance than the others. This is where the ColumnInfo table is used. As mentioned before, this table has a column called columnWeight. Each column that is to be indexed is given a weight in this column. This is a way of assigning importance or relevance for a column. Based on these weights, the case with the word occurring in the most important column can obviously be considered as the most relevant among its peers.

PAGE 51

42 4.3.3 Adjacency of Words Yet another strategy used to order relevant results is to use an exact phrase match. This is also used in all the major web search engines and is a pretty simple approach. If a user is looking for the search phrase heart cancer, then it makes sense that cases having the words heart and cancer adjacent to each other in the same order are more relevant than cases having just one of the two words. Also, this principle can be applied recursively. For example, if the user enters the phrase, swollen heart anomaly, then, as earlier mentioned, the cases with the phrase swollen heart anomaly will be more relevant than cases with only one individual word. In addition to this, cases with partial phrase match should also obtain a higher weight than cases with individual occurrences (i.e., cases with swollen heart or heart anomaly obtain a higher weight than ones with only heart or anomaly or both of them in any other order). 4.3.4 Weighted Words If a search phrase has more than one word, then among the cases with only one word, the cases with the more important word obtains a higher rank than cases with a not so important word. 4.4 Implementation of Ranking 4.4.1 Ranking in Single Word Searches Now, for single word searches the weight of the word is multiplied with the weight of the column in which each occurrence of the word exists. The weight of the word is obtained from the wordlist table and the weight of the column in which it occurs can be obtained from the ColumnInfo table. After completion of the above procedure, the result is a list of case_numbers and corresponding weights. There might be multiple

PAGE 52

43 occurrences of a word in a given case, which means the initial list would have had weights of each individual word occurrence. To consolidate the list, we group the result set by case_number and sum the weights. Mathematically, it can be represented as For each case, the final weight = [ (column_weight) ] word_weight For all occurrences of the word in a case Finally, we have a list of case_numbers and a final weight for each case. This is used to rank the cases such that the case with highest weight gets the best rank. It is obvious that cases with more occurrences of the word in the more important columns are in the top of the list and cases with fewer words in lesser important columns are towards the end and all other combinations exist somewhere in the middle of the list. Changing the weights assigned to columns can change the behavior of the search results greatly. 4.4.2 Ranking in Multiple Word Searches 4.4.2.1 Additional strategy Ranking multiple-word search queries is very different from ranking single-word queries. Here, in addition to the above-mentioned strategies (for single-word queries), the relative ordering of the words within the search phrase also needs to be taken into consideration. The cases where the exact search phrase is found are obviously most relevant and should get the maximum weight. This forms the first tier of cases. In the second tier are cases having all the words of the search phrase, where these words are scattered across the case. Again, column weights still hold here. The third tier consists of all cases having more than one word but not all of them. Ranking of cases in the third tier can be a bit complicated. Cases which having a larger number of higher weighted words in more weighted columns get a better rank than cases

PAGE 53

44 having fewer words in lower weighted columns. Tweaking the weights of the columns can change the behavior of this tier of cases dramatically. The final tier of cases contains only one of the search terms. Obviously these are the lowest ranked among the tiers and within this tier, the ranks of cases vary based on the word they have and the column in which the word exists. To summarize, the cases with the exact-phrase need to be ranked first. The cases without an exact-phrase match follow next. And among them, cases with more words are ranked higher than cases with fewer words. First, we will look at an example and then examine how this is implemented. Table 4-1 Example Table for Ranking case_number Comment 1 This is an interesting case of heart cancer 2 Patient complained of pain in the kidney 3 Good Example of malfunction in the heart 4 Lungs severely affected due to smoking 5 a cancer of heart is suspected 6 patient might need a heart transplant soon 7 Looks like cancer Let us say Table 4.1 is a simplified RTF. For simplicity, it has only two columns and contains 7 cases. Now, when a user submits the search phrase cancer of heart, all cases except case number 4 appear. Case number 5 appears first because it has an exact phrase match this forms the first tier of cases. Case number 1 comes next since it has all the words of the search phrase though not in that order. This is the second tier of cases. Case number 3 forms the next tier. It has two words from the search phrase heart and of. Case numbers 2, 6, and 7 comes last as they all have only one word

PAGE 54

45 from the search phrase. These form the last tier of cases. The ranking among these cases depends on the weights assigned to the words themselves. 4.4.2.2 Implementation First an exact phrase search is done. For this, the case_number lists for all the individual words are calculated. Once these lists are computed, the position fields are checked to see if the words exist adjacent to each other within a given case and column. The lists are merged based on this condition. In the end, only one list of case_numbers exists, which are the cases where the exact phrase is present. This list of cases is kept aside for the time being. Also, a special exact-phrase constant value is added to all these cases weights to make sure that they get a better rank than the other lists of cases. It should be noted that the exact-phrase match works recursively. In other words, even partial exact phrase matches are given extra weight when compared to the single word matches. As the next step, each word is searched in the WordList table, and then with its word_id all the info is obtained from the WordInfo table. So, at the end, we have a list of word_ids and each word_id has a case_number list with weights. If we merge all these lists and group them by case_number and then sum the weights, we obtain one final list of case_numbers with its corresponding final weights. This list is merged with the exact-phrase result list. Since we added a constant to the list of exact phrase weights, they have an increased weight and stay at the top of the list. The cases with a larger number of words obtain a better weight because the sum of all the weights is greater. Mathematically, the formula is for each case, the final weight =

PAGE 55

46 [ (column_weight word_weight) ] + exact_phrase_constant*n For all occurrences of all words in a case where exact_phrase_constant is an arbitrary constant added to exact phrase matches to force them higher in the list. For cases having no phrase matches, this value is zero. And, n is the number of phrase matches found in that case. The final list of case_numbers and weights is sorted in descending order and then the TF table is queried for the corresponding case_numbers to obtain the final list of results. 4.4 Summary and What Is Next We examined the design of the 3-layer indexing system. We then saw how WebSE performs better than the traditional search engine, by making use of this indexing system. We looked at the ranking strategies used and also how these are implemented. We also looked at how the indexing system updates itself. The next chapter proposes some future extensions to WebSE.

PAGE 56

CHAPTER 5 CONCLUSIONS AND FUTURE WORK This thesis examined the requirements of a search engine for the RTF and then presented the design and features of WebSE. WebSE focuses on searching the database efficiently and also providing a method of ranking the search results. Presented were an introduction to the problem, a design overview, the design of the existing RTF, the technologies used, the underlying database design, the design of the indexing system and how it improves the query processing, and the ranking strategies used. This chapter presents a final examination of WebSE and proposes some future extensions. 5.1 WebSE In this thesis, a 3-level indexing system was described and used. It not only helped in making the search process efficient, it also made the search scalable to larger databases without a linear increase in processing time. WebSE also sorts the search results based on the relevancy. This allows the user to look at the most relevant cases first, thus making searching for cases easier and more efficient for the user. WebSE presents a user-friendly interface making it easily acceptable in the medical community. The interface used is borrowed from Google. Since almost everyone is familiar with the Google interface, using the WebSE interface is very intuitive. WebSE allows a basic search where the user can search the RTF using a set of keywords or a more advanced search, where the user can add more constraints to the search to make it more specific. 47

PAGE 57

48 WebSE also offers some customizability. The fields to be shown in the search results screen can be changed by the administrator. The administrator can also change the weights assigned to columns to change the relative ordering of search results. This allows for greater control over ranking. 5.2 Extensions The following features can be added to WebSE to make it more user-friendly and powerful as a search tool. 5.2.1 Synonyms WebSE uses a keyword-based search to find relevant cases in the RTF. However, this is not free from limitations. It is a known fact that synonyms exist for many English words and medical terms. So, if someone searches for the term heart, WebSE will only find cases containing that word. However, there might be some other good cases in the RTF that are very much relevant to heart that use the medical synonym cardio. Right now, the user can get around this by searching for all the synonyms of a word, but it would better if WebSE can build a database of synonyms and automatically search for all of them. 5.2.2 Spelling Mistakes Since most of the information in the RTF is entered through typing, there is always a chance of spelling mistakes. This could lead to cases not appearing in the search results. For example, if the word kidney is misspelt as kidey, then this case will not appear for the search term kidney unless there is an occurrence of kidney somewhere else in the same case. As a future extension to WebSE, a strategy for catching such typographical errors can be devised. This will improve the accuracy of the search system and will lead to better utilization of the information in the RTF.

PAGE 58

49 5.3 Summary In the information age, an efficient and easy-to-use search tool can be priceless in any organization, including the medical community. With WebSEs simple and user-friendly interface and efficient and scalable search, the information in the RTF can be accessed quickly. Once the physicians are able to access any case quickly, they can provide a better teaching experience by using these for illustration.

PAGE 59

LIST OF REFERENCES ACR. 1994. American College of Radiology. www.acr.org Visited Apr 2002. Balakrishnan A. 2002. Design and Analysis of User Interface for Radiology Teaching File. Masters Thesis. University of Florida. Dugas M., Trumm C., Stbler A., Pander E., Hundt W., Scheidler J., Brning R., Helmberger T., Waggershauser T., Matzko M. and Reiser M. Case Oriented Computer-Based Training in Radiology: Concept, Implementation and Evaluation. Oct 2001. BMC Medical Education, University of Munich, Germany. Elmasri R. and Navathe S.B. 2001. Fundamentals of Database Systems. Pearson Education, Singapore. Eylen D.V. 1997. AltaVista Ranking of Query Results. http://www.ping.be/~ping0658/avrank.html Visited Feb 2002. Harvard Medical School. 2001. BrighamRAD. http://brighamrad.harvard.edu/education/online/tcd/tcd.html Visited May 2002. Microsoft. 2000a. VBScript. http://msdn.microsoft.com/library/default.asp?url=/library/enus/script56/html/vtoriVBScript.asp Visited Mar 2002. Microsoft. 2000b. MS Access Help. Microsoft, Seattle. Microsoft. 2000c. SQL Server Help. Microsoft, Seattle. Microsoft. 2001. Microsoft ODBC and Universal Data Access. http://www.microsoft.com/data/odbc/default.htm Visited Mar 2002. Microsoft. 2002. Active Server Pages. http://msdn.microsoft.com/library/default.asp?URL=/library/psdk/iisref/aspguide.ht m Visited Nov 2001. Stonebraker M., Hellerstein J.M. 1998. Readings in Database Systems. Morgan Kaufman, San Francisco, CA. University of Washington. 2000. Radiology Teaching File. http://www.rad.washington.edu/maintf/ Visited May 2002. 50

PAGE 60

BIOGRAPHICAL SKETCH Shriram Lakshmi Narasimhan was born in Tiruchy, Tamil Nadu, India, on November 18 th 1978. He has lived in Hyderabad for the first 21 years of his life. He graduated from Jawaharlal Nehru Technological University in 2000 and received his bachelors degree in computer science and engineering. He is pursuing his graduate study with a specialization in databases at the University of Florida. He plans to graduate in December 2002 with a masters degree in computer science. Shriram worked for the University of Florida McTrans Center as a programmer during his graduate study. There, he contributed to the development of the Highway Capacity Software, among other things. He plans to move to Seattle where he has a position with Microsoft as a software design and test engineer in the SQL Server group. 51