Application-independent document storage using a generic markup language

MISSING IMAGE

Material Information

Title:
Application-independent document storage using a generic markup language
Physical Description:
x, 189 leaves : ill. ; 29 cm.
Language:
English
Creator:
Harrison, Tony V
Publication Date:

Subjects

Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1995.
Bibliography:
Includes bibliographical references (leaves 173-186).
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
by Tony Vincent Harrison.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 002045430
notis - AKN3354
oclc - 33392831
System ID:
AA00003195:00001

Full Text










APPLICATION-INDEPENDENT DOCUMENT STORAGE
USING A GENERIC MARKUP LANGUAGE

















By


TONY


VINCENT


HARRISON


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY


UNIVERSITY


OF FLORIDA
































Copyright


Tony V


1995


. Harrison
































Dedicated


Dad,


to Mom,


and Ewell.












ACKNOWLEDGEMENTS


I would


like


to thank


Watson


the


opportunity


work


with


him


on this


research


project.


He has


helped


me in


more


ways


than


he knows.


This


technology


the


future


information


management.


Thanks


again,


Dennis.


I would


also


like


thank


Mishoe


taking


over


Shoup


as my


major


friend


professor.


and


He has


student.


also


Special


been


thanks


a great


to Dr.


help


Beck


as a


help


understanding


the


database


and


retrieval


process


with


FAIRS,


and


the


overall


CD-ROM


project


Also,


wish


thank


Jeff


Nelson,


David


Williams,


Ling


and


the


entire


FAIRS


staff


for


their


input.


Thanks


to Dr.


Peart


for


help


with


learning


systems,


process


has


been


used


this


project


and


others.


To Dr.


Kilmer,


would


like


to thank


him


his


patience


throughout


this


long


process.


I would


like


to thank


Mary


Cilley


her


work


with


the


FAST-WP


development


process


and


editing,


Steve


Eissinger with


WP2SGML,


and


Michael


Harper


with


retrieval


software


conversion.


would


also


like


thank


Drs.


Shoup


and


Isaacs,


whom


consider


mentors


during


initial


years


here


at the


University


of Florida.


would


also


like


thanks


to all


the


faculty


and


staff


the


Agricultural


and


Biological


Engineering


Department.







list.


Finally,


Cecil


and


Mary


Harrison,


and


Dalton


and


Bernice


Harrison,


thanks


support.



















ACKNOWLEDGE:


ABSTRACT


TABLE OF CONTENTS



EMENTS * iv

* * * * * XX


CHAPTERS


INTRODUCTION


Justification .
Overall Objective .
Specific Objective
Specific Objective


a .e
Number
Number


* a
One
Two


* . .
* a . .
.* . .


REVIEW


OF LITERATURE


a a a . a a 6


Computers as
Computer
Computer
Format
Generic
Documen
Standard Mark
History
ODA/ODIF
SGML as
Other SG
within
(Smith,


What
What
How
How
An S
Bene
Tnst


SGM
SGM
Does
Does
GML
fits
ancru


Information
Storage
Recognition


Tools .

of Document


. Specific Markup


an


M
t


;L
:L


Information


Sand
and


Computer Generate


* a a a a a a a a a a a a a a
Languages . . 1
* a a a a a a a a a a a a a a a a a 1 ^
. . . . . 1
* a a .a S a a a . 1
International Standard . 1


L S
he
198
Is
Do


standards o:
group that
9a) .


es


SGML


SNot
Not


SMeal
Mea


Describe


SGML Work? .
Application .
of Converting
Cs


r


R
de


reports Being
veloped ISO


Reviewed
8879-1986


* . a a a 1

n a a a a a a a a 1
Structure? . 1
*~~~~~ ~ 2


Documents


Hypertext Markup Language
Commercial Document Processing I
Academic and Academic/Commercia:


Into


SGML


* a. a a 23
e .s . 24
Models ....... 25


Document


Proce


Models . . .
Abstract Document Mode
'S m.~L. na J


1


* . . 26
. . . 26


-- w


!








COBATEF Sy
FOAM .
FORMER .
Integrated


stem . . . 28
* . . . 29
S . . . . 29


System


Complex


Computer-Based


Documents
Text Editor
Maestro .
Mixed mode
PEN .
TEXTNET .
Other Document P
Hypertext and Hy


PROCEDURES


4
*


di


rp
p'


Lara . . . 3

document processing system . 3
* Sy s s S S S S 3
* . . 3
eparation Systems. . . 3
ermedia Systems . . 3


S S S S S S S S S S S S S a 41


Procedures
Procedures


Specific
Specific


Objective
Objective


Number
Number


One
Two


. 41
* *. 42


MODEL


DEVELOPMENT


PROCESS


S S S S S S S S S S 44


Determine Sample Set of FCES
Results of Document Analysis
Publications . .
Model (DTD) Development and


Publications
on Selected


Selection


FCES


. 44


. . 45
. 46


MODEL


VERIFICATION


. . . . 64


FCES P
Forma
I

C

Conver
C

F
F
M
G
Result
Forma


ublica
t .
dentif
Public
onvers
Format
sion o
andide
FAIRS
AIRS C
AIRS C
ultime
uide .
s of F
t .
onvert
Format
onvert
Format
onvert


tion


Preparation


cat ion
cation
tions of
on of


f F
: ,
DIS
D-R
D-R
dia

CES
ing
ing

ing

ing


Interpretation
Systems .


Possible


Model


CE
Th
C8
OM
OM


A


* .
of
* FCES
FACES


Model


and


Convers


Elements


Publications


Instan'
Semant
nd DIS'


DIS
DIS
ewe


* Instances
Instances


* .
FCE
FCE
FCE


* .
in
* Int
Int


ion
* .
FCE
* SG
o SG


to SGML
. 64
S
. 64
ML


.* 6
ces Into Retrieval Format 6
ic Data Modelling Language Fo
9 . . . 6
* . 6
. .* 7
* . . 7
*


Conv


Instance

Instance


FCES Instance
of Elements by


Retrieval


* into
into
into
into
* .
into


FAIRS


Retrieval


. 74


Multimedia Viewer


Guide


Automated


Format


. 75
. 75


Retrieval


* * 80
Changes . *. . 80


1







SUMMARY,


CONCLUSIONS,


AND


RECOMMENDATIONS


*. 81


Summary


a a a a a S a a a a a a a a a a a a a 81


Conclusions/Findings


. a a a a a 83


Observations . . . . 83
Recommendat ions . . . . 85


GLOSSARY


a a a a a a a a a a a a 86


APPENDICES


DEVELOPMENT


OF AN SGML


MODEL


. a 97


PROCESS


AND


METHODOLOGY


FOR


DEVELOPING


AN SGML


APPLICATION


SELECTED


TREE


PUBLICATIONS


STRUCTURE


MODEL


ELEMENTS,


ATTRIBUTES,


AND


ENTITIES


STRUCTURE


OF THE


POPUP


MENU


USED


FOR


FAST-WP


RELATIONSHIP


BETWEEN


STYLES


AND


ELEMENTS


LITERATURE


REVIEWED


BUT


NOT


INCLUDED


IN THESIS


REFERENCE


LIST


BIOGRAPHICAL


SKETCH














Abstract


the


of Dissertation


University


Requirements


Presented


of Florida


the


Degree


to the


Partial I
of Doctor


Graduate


School


fulfillment of
of Philosophy


the


APPLICATION-INDEPENDENT


DOCUMENT


STORAGE


USING


A GENERIC


MARKUP


LANGUAGE


Tony


Ma'


. Harrison

y 1995


Chairperson:


. Mishoe


Major


Department


: Agricultural


Engineering


Documents


are generally


stored


a proprietary


software


format


computer


specific


hardware


computer


software


platform.


different


Users


than


who


those


have


that


developed


the


documents


must


use


manual


processes


such


rekeying

objective


document


research


access

project


the

was i


information.


to store


The


documents


a format


that


allows


computer


hardware


or software


use


the information

(FCES) documents.


Florida


Fifty


Cooperative


documents were


Extension


randomly


Service


selected from


FAIRS'


DISC


CD-ROM


produced


the


Florida


Agricultural


Information


Retrieval


System


(FAIRS)


University


Florida


(UF))


model


development.


A document


analysis


the


FCES


publications


produced


tree


structure


that


identified


document


elements


their


hierarchical


S- A S --


rll I


-1 r


m i


---- r--







known


as Standard


Generalized Markup


Language


(SGML)


was


used


to represent


FCES publication structure as a model.


The model


(Document


Type


Definition


(DTD))


was


developed


based


on the


tree


structure


and


the


Association


American


Publishers


(AAP)


Article


model.


The


FCES


model


consisted


of elements


the


AAP


model


having


the


same


definition


as those


the


tree


structure


and


unique


elements


required


A commercial


parser


verified


that


the


model


conformed


SGML


rules


and


syntax


Each


FCES


publication


was


tagged


with


Florida'


Authoring System Tools


for WordPerfect


(FAST-WP),


whose styles


were


The


developed


tagged


from


FCES


the


structural


publications


were


properties


then


the


converted


model.


SGML


instances


content


commercial


(tagge<

the

parsei


text


model.

r to v


files)

Each


'erify


based


instance


that


the


the

was


model


structure


parsed


was


and


with


adequate


representation


the


structure


FCES


publications.


The


FCES


model


was


then


found


to be application-independent


after


converting


the


instances


into


FAIRS


DISCS


and


DISC,


Multimedia


Viewer,


and


Guide


retrieval


system


format


on-


screen display.

been incorpora


The


ted


FCES model


into


developed


vertically


this


integrated


research has

electronic


information


system


used


FAIRS


to deliver


FCES


information


CD-ROM


World


Wide


Web.













CHAPTER


INTRODUCTION


Justification


significant


problem


facing


institutions


the


inability


to retain


and


share


knowledge


a form


independent


of specific


computer


systems.


The


distribution


this


vast


pool


knowledge


electronic


documents


primarily


printed


material,


although


electronic


files


may


be available


from


publisher.


decades


the


distribution


federal


documents


has


been


as paper


documents


and


on microfiche


(The


Office


Technology


Assessment,


1988) .


The


cost


distributing


information as printed


documents


is a


concern not


only


of government


agencies


also


of private


businesses.


example,


according


Robert


Bennett,


second


vice


president at


the Travelers


Corporation,


only


15 percent of


$300


million


budget


printing


and


publishing


was


the


data-center


or printing-center.


The


remaining


85 percent


was


associated


with


costs


weighing,


storage,


people


that


sort


(Francis


reduce


mail


, 1990).


printing


Bennett


significant


believes


portion


the


and


publishing


electronic

e printing


process


publishing


and


can


publishing


budget


at Travelers


Corporation.









The


Florida


Cooperative


Extension


Service


(FCES)


has


not


been


immune


this


problem.


The


high


cost


distributing


printed documents has


to efforts to distribute


information


electronically through


computer


systems.


FCES


produces


about


500 printed publications per year.


Budget cuts and increasing


publishing


costs


moved


FCES


to place


all


publications


on the


Florida


Agricultural


Information


Retrieval


System


(FAIRS)


(Beck


al.,


1994).


FAIRS


provides


delivery


all


FCES


documents


specialists,


electronic


researchers,


form


growers,


and


county


homeowners.


extension


FAIRS


uses


hypertext,


retrieval


full-text


strategies


search,


retrieve


and


browsing


information


information


from


over


megabytes

printed p


of data.


publication


The


same word


process


are


processing

delivered


files used

through the


the


FAIRS


database.


However,


the


conversion


electronic


documents


into


the


FAIRS


database


format


was


laborious


process.


text


editor


divided


each


document


into


sections


based


upon


structure


and


content.


Besides


the


labor


requirement,


revision


an electronic


publication required


the


repetition


of the


entire conversion process.


These restrictions gave the


impetus


to automate


entry


structure


and


content


from


publications


into


electronic


information


systems.


Electronic


publishing


typically


begins


with


the


use


word


processing


software


develop


documents.


Word


a~~~~~r -. a a


r


-I ( ( I


rl


r









generate


electronic


documents


(Rahtz,


1987).


However,


there


often


limited


ability


transfer


these


documents


directly


into


an electronic


information


system,


because


word


processing software uses different coding schemes to represent


a document'


structure


(hierarchical


organization,


depicted by


title and headings)


and content


(i.e.,


text,


figures,


tables).


However,


software


systems


retrieval


information


typically


use


database


structure


storage


(Beck


and


Watson,

structure


1992) .

and


These


two


content


representations


are


incompatible,


document


producing


transferability


and


portability


problems


between


word


processing


electronic


information


systems.


Retaining


document


structure


and


information


content


during


changes


in computer


hardware


software


critical


long-term


successful


electronic


information


system


development.


computer


hardware


and


software


constantly


evolve


and


change,


knowledge


must


outlive


technology used


develop


and


initially


deliver


the


information.


Yuri


Rubinsky


(1989)


says


that


"Although


one


cannot


anticipate


future publishing requirements and


technologies,


a plan can be


developed


to recycle


information.


The


best


way


to do this


to store


information


in a


standardized


way,


independent


particular


technology


or presentation


method


(page








Overall


Objective


The


overall


objective


this


research


project


model


the


structure


and


represent


the


data


technical


publications


an electronic


form


that


independent


any


computer


hardware


or application.


Specific


Objective


Number


One


The


first specific objective


to model


the


structure of


a set


of technical


publications,


and


represent


the


content


the


publications


SPecific


Objective


an application-independent


Number


form.


Two


The


second


specific


objective


to verify


the


model


automating and


testing a


process


of using


FCES publications


SGML


form


electronic


storage


delivery.


The remainder


of this dissertation


is organized into


five


chapters


description,


Chapter


uses,


II provides


and


an overview


applications.


of generic


Also


markup


included


review


and


of several


hypertext


commercial


and


and


hypermedia


academic


models


formatting


used


models,


document


conversion.


Chapter


describes


the


procedures


accomplishing


the


Chapter


overall


describes


specific


the


model


objectives


development


Chapter


process


converting


FCES


publications


into


an application-independent









present


document


information.


summary


the


authors


work,


conclusions


recommendations


future


work


modeling


documents


are


Chapter













CHAPTER


REVIEW


OF LITERATURE


Computers


as Information


Tools


Computer


Storacre


Computers


store


information


electronic


form,


such


the


customer


database


business


that


includes


each


customer'


name,


address,


and


telephone


number.


Businesses


can


develop


multiple


applications


such


billing,


sales


promotions,


and


customer


service


when


information


organized.


Files,


chapters,


page


number,


and


series


pages


are


ways


organize


both


structural


and


contextual


information


electronic


documents


(Graphic


Communications


Association,


vary,

voice,


1991)


and include

which can


The


text,


impede


content


images,


transfer


of electronic


graphics, spr

of information


documents


*eadsheets,


(Ansen,


can


and


1989).


However,


the


key


process


the


electronic


storage


and


retrieval


the


information


human


consumption.


ComPuter


Recognition


Document


Information


and


Format


Text


presented


two-dimensional


form


as information


human


consumption


(Horak,


1984) .


example,


technical


..









reader


interpretation


of the


information.


The


structure


aids


a reader


finding


desired


information.


However,


computers


only


process


the


information


while


humans


interpret


based


on the


structure.


When computers


are used


for document processing,


they are


primarily used


either printing


or viewing


a document.


For


example,


one


could


use


information


printed


software


manual


describing


installation


procedures


particular


software


package.


The


structure


the


manual


describes


the


organization


information.


From


this


organization,


subjects


are


recognized


based


on formatting


areas


such


titles,


where


sentences


and


words


end,


key


words,


and


punctuation.


Coombs


(1987)


suggest


that


those


documents


with


both


accurate


descriptive


markup


can


ported


from


one


computer


system


to another.


Descriptive


markup


each


subject


area


document


allows


information


processed


many


ways.


However,


while

softwa


documents

re can re


usually


trieve


have


no explicit


information


structure,


particular


fiel


database

d names,


such as


customer


name


, due


to their


explicit


structure.


While


the


computer


only


sees


information


electronic


document,


format


humans

the di


can


distinguish


Document.


between


Computers


with


the

this


information

s ability c


and


would


treat


any


electronic


document


a database


information.


m


-


-









manual,


handbook,


other


document


and


use


many


different


applications.


Desktop


publishing


software


and


some


word


processors


enable publishers to set


up standard print styles so documents


from


different


authors


can


incorporated


into


uniform


series.


These


word


processors


also


render


the


final


printed


text


easier


for


readers


to interpret


making


the


structure


more


apparent


(Wilson,


1991).


example,


a word


processing


file can allow different computer hardware


to use and


exchange


information


representation.


First,


the


electronic


file


contains


codes


to specific


page


size,


fonts,


and


position


text.


Proper


conversion


or duplication


of print


presentation


different


system


requires


the


same


word


processing


software,


printer,


and


perhaps


soft


fonts.


Second,


computers


allow the use of many different proprietary software


languages


and


packages.


Incompatible


computer


hardware


software


formats


require


manual


intervention


users


use


the


information.


information


accessing


This


that


the


restricts

inside


information


the

these


electronic

documents.


electronic


access


Finally,


documents


from


different


institutions


and


companies


can


very


difficult.


Valuable


time


spent


explaining


the


different


versions,


and


possibly


helping


make


the


document


compatible


with


another


an aS a


a a.. 1 A


computer


- a


*1 an-u


system.
i~ fJ J^ j U I f


Upgrading


hardware


4


or software


' ,, ..... 1: 44 -4a


1









these


concerns


are


new


Smith


(1985)


quotes


a January


1979


report


the


National


Computer


Users


' Forum


addressing


these


concerns.


The


report


describes


the


multiple


character


sets


the


biggest


problem


when


interchanging


data.


The


primary


causes


these


multiple


character


sets


are


observance


poor


standards.


Generic


vs.


Specific


Markuo


a Computer


Generated


Document


The


hierarchical


organizations


information


electronic


documents


aid


reader'


comprehension


the


material.


printed


form


, a


reader


can


quickly


scan


document and


understand


structure by viewing the different


typefaces


zes


various


level


information.


example,


a large


bold


font


with


centered


text


could


represent


the


highest


level


headings


and


left


-justified


medium-size


font


could


represent


lower


level


headings.


These


typographical


conventions


provide


visual


cues


the


reader


and are


important


for video display


an electronic document.


example,


one


could


use


the


highest


level


headings


table


of contents,


with


a hyperlink


to text


each


heading


However


, the


font


text


justification


used


one


author


first


level


headings


may


the


same


format


used


another


author


third


level


headings.


These


structural


ambiguities


prevent


modeling


documents


use


- A- -. -a-- .~









Requiring


authors


use


the


same


word


processor


would


allow


the


production


of a model


describing


the


structure


and


appearance


set


documents.


The


model


would


define


specific


font


and


positioning


codes


specific


markup)


describe each structural


element


(i.e.,


generic markup such as


title


and


author).


The


explicitness


the


model


allows


software


convert


from


the


file


format


the


word


processing program


to a format


electronic


information


systems


could

account


understand.

it for dif3


However,


ferences


a model


of a set


specific


of documents


formatting


codes


must

and


preferences


before


possible


use


an electronic


information


system.


A layer


of abstraction


can


provide


a way


to account


different


document.


formatting


Goldfarb


codes


(1980)


and


preferences


describes this


an electronic


layer of


abstraction


markup.


The


markup


should


include


first


the


separation


between


the


model


and


specific


formatting


codes,


and


then


the


processing


functions


Generic markup,


structural


example,


elements


identifying a


first


document.


level heading


Head


instead


with


large,


bold,


centered


font,


provides


the


needed


layer


abstraction.


Many


word


processing


programs


provide


generic


markup


with


feature


common'

styles


called


separate


style


sheets


structure


and


styles.


text


(content)


Style


sheets


document


S a


__


..









documents


can


serve


as a way


to introduce


generic


markup


into


an electronic


document


(Cilley


Watson,


1992a


and


1992b).


The generic markup allows


further processing


of a document


database


storage or


other processing


such as CD-ROM


(Cilley


al.,


1990).


document are


example,


identified


the


with an


authors


"author"


electronic


style code


instead


specific


font


information.


the


time


printing


display


specific markup replaces


the generic codes based upon


the


style.


system


using


generic


markup


was


initiated


the


University


Florida


CD-ROM


project


(CD-ROM


Implementation


Group,


1990) .


The


approach


developers


was


to add


generic


codes


to word


processing


documents


as an


aid


to knowledge


acquisition.


Adding


same


generic


codes


the


same


structure


model


files


subject


that


would


can


allow


containing


areas


multiple


be modeled.


conversion


both


structure


documents


Development


similar


content.


provides


the


document


documents


This


into


separation


gives


similar


documents


an explicit


structure


that


electronic


information


systems


can


use,


example,


perform a

screen.

8879-1986)


query


search


Standard G<

provides


display


aneralized

the b<


document


Markup


asis


Language


information


(SGML)


structural


on-


(ISO

model


development.









Standard


Markun


Lanouaues


History


Publishing


companies


recognized


the


need


for


a standard


specifying


document


architecture


early


the


1960s


(Rodgers,


1989).


In September


1967,


William Tunnicliffe of


the Graphics


Communications


Association


(GCA)


suggested


using


generic


coding


descriptive


tags


separate


information


(content)


from


format


(Goldfarb,


1990)


1969,


Charles


Goldfarb


developed


Generalized


Markup


Language


(GML)(Goldfarb


et al.,


1970)(Goldfarb,


1980)


to integrate


office


information


systems


(Goldfarb,


1990).


Goldfarb'


work


served


as the


basis


the


international


standard,


Standard


Generalized Markup


Language


(ISO 8879-1986).


Another approach


specifying


document


architecture


The


Office


Document


Architecture.


ODA/ODIF


The


Office


Document


Architecture


and


Office


Document


Interchange


interchange


Format


currently


(ODA/ODIF)


under


approach


development


document


Department


Commerce,


1988).


Ansen


(1989)


describes


the


architecture


a set


of standards


both structuring and


encoding documents


for


interchange


between


dissimilar


systems.


draft


* S .. ~ -!' ~ -


, *pD '


-,,,,. -. LLae Jt.a tLss.Lj .~a1


r r









suggests


that


ODA


looks


at document


structure


both


logically


(e.g.,


sections)


layout


view


(i.e.,


phy


sical


view


decides


how


the


content


appears).


Scheller


(1988)


outlines


the


following


differences


between


SGML


and


ODA:


ODA


restricts


attributes


those


specified


the


standards,


while


SGML


enables


the


definition


desired


attribute.


SGML documents


have no semantics defined


in the standard,


while


ODA


documents


contain


semantics


document


representation.


ODA


restricts


the


content


of documents


to the


standard,


while


SGML


has


no restrictions


ODA doc

special


tUrn


ents are

input


interpreted


systems,


while


machines at

SGML has


required


such


restrictions.


Interchange


of ODA documents


between


different


personnel


needs


agreement,


while


SGML


documents


can


only


interpreted


within


special


applications


environments.


ODA


documents


are


represented


a special


formatter


the


semantics


layout


description


document


formatting


language.


The


formatter


and


document


class


describe


the


representation


of SGML


documents.


Scheller


(1988)


provides


most


crucial


difference


between


ODA


SGML


"the


fact


that


ODA


facilitates


the


S -. t ft -- .l a1 S









documents


can


only


interchanged


within


clearly


defined


applications


areas


but


are


not


subject


to restrictions


with


respect


to functionality


(page


142) .


He also


says


that


the


number


representational


context-dependent


layout


possibilities,

descriptions,


content


and


types,


automatic


generation


of both


tables of


content and references are absent


when


using


ODA


the


technical,


scientific,


and


publishing


area.


SGML


as an International


Standard


the


early


1980s,


the


International


Standard


Organization


(ISO)


began preparing standards to allow transfer


multiple


document


types


over


varied


computer


systems


(Bryan,


1988)


In December


of 1986,


ISO


issued


standard


for


document representation known


as the


Standard


Generalized


Markup


Language.


The


standard


committees


who


deal


with


the


SGML


standard


areas


are


as follows:


Joint


Technical


Committee


(JTC1)


Information


Processing.


Sub


Committee


(SC18)


Text


and


Office


Systems.


Working


Group


(WG8)


project


Computer


Languages


Processing


Text.


The


project


editor


of the


standard


Charles


Goldfarb


of IBM










Other
arouD


SGML
that


Standards
developed


Reports


8879-1986


Beincr


(Smith.


Reviewed


within


the


1989a)


Document

(DSSSL -


Style

ISO


and S

10179)


emantic


This


Specification

s standard Dr


Language


:ovides


language


to describe


the


translation


of SGML markup


in a document


specific


format.


The


simplest


application


would


be a style


sheet.


This


goes


separate


specification


than


the


DTD.


It allows


the


exchange


SGML


file


among


different


systems,


and


specific


DSSSL


representations


the


document.


Font


Information


Interchange


(ISO/DIS


9541)


Problems


occur


when


exchanging


page


or document


DSSSL


form


from


one


computer


another


when


printers


are


different.


For


example,


Helvetica


14-point


font


may


look


different


from


one


printer


to another.


Thus,


there


to be


standard


way


describing


fonts


printed


pages


document


to be


identical


from


system


to system.


Guidelines


technical


SGML


report


Syntax-Directed


specifies


Systems


series


- This


guidelines


capabilities


SGML


syntax-directed


editing


system.


Retroactive


Conversion


This


technical


report


r


w









do not


understand


SGML.


It looks


at SGML


features


that


can


reduce


the


markup


needed


within


document.


SGML


Document


Interchange


Format


(SDIF


- ISO


9069)


This


SGML


standard


document


allows


the


means


interchange


open


systems


interconnection


(OSI)


techniques.


Standard


Page


Description


Language


(SPDL


10180)


XeroX


and


Adobe


are


developing


this


standard


way


document

way.


postscript


exchange

between


language.


finished

computer


example,


would


pages


systems


application


provide


electronic


standard


receives


document


page


from


one


computer


another


identical


printing


the


page.


Techniques


using


SGML


(ISO/DTR


9573)


These


techniques


describe


the


design


document


type


definitions,


including


mathematics.


Criticisms


outlined


Smith


(1989a),


which


have


been


directed


this


area,


include


taking


advantage


database


publishing


and


mathematics.


There


are


several


ongoing


projects


SGML


conformance


(Graphic


Communications


Association,


1991).


First,


there


initiative


from


the


executive


committee


the


Graphics


fl" afL- a -


SA A~ A a) nI 4.. a -


tr'n~% 4-A, a a ~4 is.1 lnr~aCi 4s. ,'


]


I~YA~CA


IFnn\ L FI


rC









international


standards.


Second,


there


a project


the


development


the


binary


encoding


SGML


(SGML-B).


entails


one-to-one


translation


between


SGML


file


and


SGML-B


file


to enable


quick


access


time


a computer.


This


important


CD-ROM


production.


Currently,


sequential


coding


requires


building


indexes


fast


access


information.


SGML-B will


allow CD-ROM production personnel


place


a binary


file


directly


on a CD.


Second,


a Hypermedia/Time-Based


Subset


(HyTime)


solution


hypertext


was


first


published


ISO


late


1992


(ISO/IEC


10744


:1992).


HyTime


was


added


to provide


ways


different


information


coexist


and


work


together


everchanging


environment.


The


combination


of HyTime


and


SGML


provides


greater


information management


among


different media


such


textual


information.


information,


As of this


writing


audio,


animation,


(August,


1994),


and

there


digital


are


fully


HyTime-conforming


applications


the


marketplace.


What


SGML


SGML


standard


full-text


database


publishing


(Smith,


1986b)


that


defines


the


character


processing


information


safely


over


system


(SoftQuad,


1991).


provides


descriptive


architecture


specific


language


syntax.


modeling


SGML


document


terminology,


a 'S -A


q


.. L


A


1 1









definition


(DTD).


Documents


converted


to an SGML format based


on the


structural


model


are


instances.


For model


development,


SGML


the


standard


defining


the


element


names


(i.e.,


subject


areas),


and


their


order,


location,


frequency,


and


relationships


within


a document.


SGML


model


explicitly


follows


the


structure


set


documents.


For


example,


SGML


model


could


require


the


generic name


"chapter"


to be placed at each


chapter


heading


a set


of documents.


Modeling


conversion

interpreted


the


structure


documents


most


set


into


electronic


documents


standard


information


file


systems.


allows


format

ASCII,


the


American


Standard


Code


Information


Interchange,


standard


character


(i.e.,


file


format)


that


SGML


can


use


represent


DTDs


(models)


and


instances.


For


example,


software


could


convert


word


processing


files


with


generic


markup


into


ASCII


format


(instances)


based


on the model


(DTD).


Electronic


information


systems


then


translate


the


instances


into


their


respective


format


video


display.


What


SGML


Does


Not


Mean


First,


SGML


a tag


or programming


language.


does


not


require


specific elements


a model


or provide


a set


of rules


to mark


a document.


Second,


SGML


does


not


define









electronic


information


systems


access


the


information


within


a document.


Third,


an SGML


file


will


not


describe


how


process


an element.


An element


open


any


processing


application.


The


SGML


application,


a program


that


uses


the


tagged


SGML


file,


decides


how


each


element


will


be processed.


The


data


or content


an element


is not


important,


there


no way


to verify


that


the


information


between


the


tags


appropriate


that


type


of element.


example,


an entire


document


can


be placed


under


the


element


"title"


the


SGML


file


(instance).


How


Does


SGML


Describe


Structure?


SGML


defines


document


structure


formally


computer


application


can


use


information.


However,


the


structure


must


not


be too


restrictive,


but


flexible


enough


to represent


several


types


restrictive


documents


documents


documents.


might


that


with


define


has


example


the


chapters,


chapters.


The


structure


another


introduction


being


too


set


structure


multiple


chapters


into


structure allows


both


document


classes


to be


defined


the


same


model.


An SGML


file


model


(instance).


(DTD)


This


normally


enables


precedes


computer


each


software


tagged text

to learn the


structural


properties


* a


document


S


that


follows.


-~ a -r ur a r *. --*--- .


UI~UAIIAYL


IU


I I









document


type


definition


(DTD)


the


model


that


defines


the


structure


of the


automobile


technical


manual


The


DTD


would


appear


before


any


instance


representing


a tagged


automobile


technical


manual.


How


Does


SGML


Work?


An


SGML


identifiers),

document.


model


location,


When


provides


order,


needed,


the


and


names


frequency


attributes


.e.


generic


of elements


provide


greater


description


elements


the


document


structure.


example,

female.


the

The


sex


of element


element


"authc


"author"

,r" may


could


have


be either


attribute


male


named


"sex,


" which


can


have


attribute


values


either


"male"


"femal


Thus


an element


may


contain


data


and/or


other


structural


properties.


and/or


Each


contained


element

within


can contain i

other elements


information

(content


(data)

model).


For


example,


chapter


may


contain


other


elements


such


paragraphs,


title


and


sections.


There


limit


restriction


element.


to what


example


type


information


may


element


can


named


inside an

videotape.


The


data


within


the


element


may


then


be a VHS


coded


tape.


Conceptually,


the


model


always


the


same.


SGML


processor


will


read


an SGML


tagged


text


file


(instance)


and


- 5 k-I -1 ---A --


~5 .-


- m1 --


-A-


"I


F









representer


(Graphic


Communications


Association,


1991)


The parser reads and understands


the


instance,


then


passes


the


information


to the


representer


The


representer can


then,


example,

suitable


convert


the


on-screen


information


display,


(instance)


provide


into


format


image


the


information


presentation


a publishing


system.


An SGML


instance


is a database


of information


that


does


not


do anything


itself.


The


parser


reads


and


understands


information


the document


following the SGML model


(DTD)


The


parser


reads


the


DTD


to distinguish


between


information


and


generic markup


, element names)


the


instance.


The


parser


uses


the


recognize


each


element


(generic


name)


instance


The


the parser verifies


instance


that each


considered


element


belongs


validated


the


when


location,


order,


frequency


found


that


instance.


An electronic


information


system


that


can


read


SGML


instances


of a specific


SGML


model


may


then


use


directed


However,


representer


can


also


be used


storage


retrieval


systems


without


above


capability


prepare


the


information


instance


their


respective


use.


A representer


prepares


information


the


instance


use

the


software


representer


such


as electronic


creates


a particular


information


systems.


Thus,


representation


elements


There


are many ways


to develop


a representer,


which


a S -j- t A ~ A a ---A,


Ir


r


r


m\-


L1IL


---- I~


-----~-









representer


needs


instructions


that


can


understand.


The


represented will


need a


list


of elements,


their meaning


- title),


their


needs,


and


the


specific


instructions


give


application


when


receives


the


elements.


The


representer


expects


the


proper


input,


because


the


parser will


verify


that


the


SGML file


corresponds


to the


DTD.


Examples


representer


include


converter


formatting


codes,


database


loader,


query


search


An SGML


ADplication


The


development


an SGML document


model


often


requires


that


goals


the


SGML


application


be set


and


a working


group


selected

involved


A working


the


group


development


select


an SGML


group


application


individuals

n (model).


Given


subset


ass


documents


such


fact


sheets


(Figure


A-l,


Appendix


, the


working


group


breaks


down


the


documents


into


eces


(Figure


, Appendix


and


develops


tree


structure


representation


(Figure A


, Appendix A)


of the


document


model


The


model


(DTD)


(Figure


A-4,


Appendix


vocabulary


representation


the


document


structure,


written upon completion


of the document analysis


and validated


for


conformance


to ISO


8879


-1986


standards


and


SGML


syntax.


Tagged documents


(Figure A


, Appendix A),


known as


instances,


are validated


to ensure


conformance


.e. G


no errors)


with


the










SGML applications can be either narrow or


broad


in scope.


The


Electronic


American


Manuscript


Publishers


(AAP)


Project


(1987)


the


and


Association


Computer-aided


Acquisition


and


Logistic


Support


CALLS )


the


Department


Defense


are


two


broad


SGML


applications.


AAP


developed


SGML


applications


book,


journal,


and


article


creations


between


1983


1987.


The AAP


SGML


application


standard


enjoys


wide


support


publishing


industries


such


as CD-ROM,


and


has


been


adopted


ANSI


application


standard


(Z39


.59) .


Cover


(1992)





(SGML


Associates,


Inc.,


1992


list


seven


document


models


(DTDs)


that


are


available


the


Internet.


These


include


the


Text


Encoding


Initiative


(TEI)


DTDs,


MAJOR


(Modular


Application


Journals)


DTDs


based


upon


the


AAP


Article


DTD,


a HyTime


DTD,


public


DTDs


available


from


Exeter,


the


CALS-BBS


forum,


DTDs


supporting


the


AAP/EPSIG


manuscript


standard,


and the


"Information Architecture" working group DTD


the


OSF


Documentation


Special


Interest


Group.


Benefits


ConvertinQ


Documents


Into


SGML


Instances


Time


because


savings


document


a major


does


benefit


have


of using


to be coded


generic


twice


markup


(Graphic


Communications


Association,


1991)


The publication process


also


reduced


because


further


rekeying


or proofreading


text


required


while


initiating


the


use


standard









initially


using


word


processing


macros


apply


specific


markup


to document


information.


However,


this


was


reduced


recognizing,


then


tagging


the


structural


elements


with


both


generic


and


specific


markup.


This


made


the


job


a lot


easier,


faster,


and


requires


less


specialized


labor,


resulting


significant


cost


savings.


SGML also provides benefits


for the


information retrieval


process.


Previously,


documents


were


stored


either


whole


text


form such


as printed material


or as electronic


files with


"specific


process


markup


through


SGML


query


aides


efficiency


the


and


information


automatic


retrieval


hyperlinking.


Query


efficiency


sections,


or topics


of a particular


improved

instead


subject.


by


of


The


selecting


an entire 1

hierarchical


specific


ist


headings,


occurrences


structure


of SGML


documents


allows


hyperlinks


link


together


parts


documents


such


as words,


titles


sections.


Hypertext


Markup


Lanquace


The


Hypertext


Markup


Language


(HTML)


based


SGML,


and


used


to describe


the


general


structure


publications


(Lemay,


1995).


The


structural


components


of a publication


ASCII


format


are


labeled


using


tags


defined


HTML.


Web


browsers


such


Mosaic


(Pfaffenberger,


1994),


which


supported


National


Center


Supercomputer


*


m


..









internet


and World


Wide


Web


(WWW).


The browser


then


reads


the


HTML


information


and


formats


the


text


and


images


screen.


The


World


Wide


Web


initiative


began


1990,


and


cooperative


organization


based


CERN,


the


European


Particle


Physics


Laboratory


Switzerland


(Lemay,


1995).


However,


the


tag


selection


in HTML


very


limited.


Currently


HTML


Level


One


handles


headings,


paragraphs,


images


and


a few


lists.


Level


Two


Two


other


levels


similar


to HTML


HTML

Leve


have

1 One


been

. but


proposed.


has


HTML


additional


features

different


support


options


interactive


based


forms


readers'


that

input.


can p

HTML


providee

Level


Three,


often


called


HTML+,


will


include


elements


centered


and


right-aligned


text,


tables,


mathematical


equations,


and


the


alignment


text


and


images


next


to each


other.


Commercial


Document


ProcessinQ


Models


Document


preparation


editing


involve


defining


the


structure


concerned

both hard


and


with


copy


content


actual


softcopy


documents,


physical

(Furuta


while


layout

et al.,


formatting


of a document


1982


Formatting


documents


using generic and


specific markup can


serve the dual


purpose of


printed and


on-screen display.


Commercial


products


vary


widely


applications,


from


text


preparation


conversion


of documents


to SGML instances.


Table


provides


a~ ~~ S Sa


r I r


r


r


1









Academic


and


Academic/Commercial


Document


Processing


Model


Academic


and


commercial


products


resulting


from academic


research


have


been


developed


various


stages


document


processing


review


several


these


model


follows.


Abstract


Document


Model


Kimura


(1986)


presents this


document processing system as


an interactive document editor


based


on an expressive document


model


for paper


electronic documents.


Earlier papers


have


presented

detail (s


concepts


;haw,


abstract


1980)(Furuta


al.,


document


model


1982)(Kimura


and


more

Shaw,


1984) (Kimura,


1984)


The


basis


document


processing


system


the


notions


abstract


and


concrete


objects,


the


hierarchical


composition


both


ordered


and


unordered


objects,


component


sharing,


and


reference


links


(Kimura,


1984)


textual,


Kimura


tabular


(1984)


, mathematical,


classifi


objects


or pictorial cl


asses.


either

Written


, a prototype


the


has


been


operation


since


the


fall


of 1983


The


system


consists


of three major


software


modules.


The


graphical


abstract


document


editor


(ADE)


integrates


the


abstract


object


module


(AOM)


and


window


object


module


(WOM),


producing the


prototype system


Abstract object


asses


were


also


developed


write


and


view


technical









and


allowance


of structural


editing within


the


document


using


specific


commands


(Kimura,


1986).


Andra


Text


Editor


Andra


(Gutknecht


and


Winiger,


1984)


modern


text


editor


1981).


and


formatter


Professor


personal


. Wirth


developed


computer


Lilith


Lilith


at the


(Wirth,


Institute


Informatik


the


Zurich


from


1977


1980.


Andra


consists


three


major


parts:


input


manager,


document


manager,


and


display


manager.


The


input


manager


interprets


user


input,


then


translates


commands


to procedure


calls.


The


document


manager


maintains


representation


documents.


The


display


manager


continuously


shows


part


the


documents


being


Ohio


edited


State'


on the


screen.


Chameleon


Project


The


Chameleon


translation


software


architecture


(Mamrak


et al


, 1988a,


1988b,


1988C,


and


1988d)


(Nicholas


and Mamrak,


1988)


was


renamed


Integrated


Chameleon


Architecture


(ICA)


(Mamrak


al.,


November


1990) .


The


project


studied


the


different


ways


that


data


can


be represented


(e.g.,


different


word


processors)


translated


desired


coding


scheme


(e.g.,


SGML


format).


addresses


broad


variety


of data


representations


construction


and


use


data









toolsets


occurred


over


a five-year period


(Barnes,


July


1990)


(Kaelbling,


1987)


(Mamrak


et al


September


1989)


(Nicholas,


1988)


(O'Connell,


1990)


(Share,


1988a


and


1988b)


The goal


the developers


was


to design


and


implement


a code-generating,


user-friendly,


data-translation architecture that would handle


translations


data


representations


from


a selected


subset


of data


COBATEF


objects.


System


The


COBATEF


system


context-based


text


formatting


system


(Peels


et al.,


1985)


consisting


both


hardware


and


software


areas


of implementation.


An automatic


text-element


recognition


mechanism


takes


advantage


the


implicit


structure of


text,


opening the way


for a


fully-automatic text-


processing


elements


system

their


The

context


COBATEF

in two


system

ways.


can

The


recognize


document


text


can


scanned


markup


element


recognition


or by


a processing


procedure


that


derives


document


structure


from


the


content.


COBATEF'


logical


software


package


structure.


identification


and


converts


has


vertical


honi


formatter


the


zontal

that


document


into


formatter

produces de


text


evice-


independent


print


files.


Several


papers


are


available


that


give


the


hardware


developments


the


project


(Janssen,


al.,


1985)


(Nijland


and


Peels,


1985)


(Peels


, 1984)









FOAM


The


FOAM


text


formatting


system


(Ganzinger


and


Willmertinger,


1985) ,


stands


Formatting


and


Meta-


formatting.


FOAM


was


developed


specifically


run


available


microcomputer


systems.


FOAM


supports


meta


(description)


and


text


levels


of formatting.


Descriptions


text and


document


classes


(at meta


level)


are


input


to a


macro


processor,


which


draws


specific


formatting


styles


from


database


macro


definitions


and


generates


specific


formatter


instance.


the


text


level,


the


resultant


formatter


class


accepts


textual


produces


input


formatted


described


document


document


based


the


formatting


styles


created


meta


level.


FORMEX


FORMEX


(Guittet,


1985)


the


formalized


exchange


electronic publishing)


was developed


to confront problems with


recovering


varied


formats


of electronic


data


and


text


within


the


European


(OP).


The


Community'


Project


Office


Management


Official


Department


the


Publications

OP developed


FORMEX


as a way


to store


publications


in a computer-readable


format


information


interchange


between


multiple


authors,


printers,


and


computer


systems.


FORMEX


unified


two


ways


interchanging


electronic


-









based


on the


Format


Bibliographic


Information


Interchange


on Magnetic


Tapes


(ISO


2709).


Output


software


extracts


data


from


files


stored


a mainframe


database.


A formatter


then


produces


and


places


the


information


into


a file


according


the


specifications


the


CCF.


The resultant


CCF


file


then


validated


with


CCF


parser.


Automatic


and


manual


editing,


and


pertinent


SGML


information,


are


added


to enable


the


file


upgraded


FORMEX


file.


However,


FORMER


not


ideally


suitable


for


the


transfer


electronic


documents


consisting


textual


information.


The


second


approach


met


8879-1986


standards


text


preparation


and


interchange.


allowed


the


creation


and


interpretation of


electronic document by humans and computers.


Typically


document


produced


author


word


processor.


The


document


conforms


SGML


standards


presenting the


information


in an explicit


format.


A formatter


converts


the


document


into


SGML


format


based


on ISO


8879-1986


standards


and


the


DTD.


An SGML


parser validates


the SGML


file


and


the


document


information


structured


specified


CCF


upgrading


an FORMER


file.


Integrated


System


Comp lex


Comouter-Based


Documents


Feiner


(1981)(1982)


developed


system


different


programs


which


drew


pictures,


composed


pages,


and


-- -A- S


-,,4-- A--


-


-- ----- -1 1 .. ranian4 P4 sA *1n I an a -. a nr ar nfl nfl n rC f r n no


~nr


,,,,r:,, 11,1


,,,,: c:


r









graphs,


whose


contents


are


made


of nodes


(pages)


These


pages


can


be nested


chapters


format


documents


such


books.

previous


The


directed-graph


research


on the


structure


Hypertext


Editing


was


developed


System


from


(Carmody


, 1969)


and


FRESS


(van


Dam


Rice,


1971) ,


also


known


File


Retrieval


Editing


System.


These


text


processing


systems


have


information


structuring


and


retrieval


capabilities

(Strandberg


and


et al


are

, 1976)


useful

. The


document


system


a series


preparation


programs


modifying


followed


and


presentation


electronic


of a document.


documents


include


The


picture


processes

layout,


document


layout,


and


document


presentation.


Text


Editor


Lara


Lara


(Gutknecht


, 1985)


a text


editor


developed


for the


Lilith


workstation


(Wirth,


1981) .


succeeds


the


Andra


stem


(Gutknecht


and


Winiger


, 1984)


does


depend


upon


a style


file.


Rather


than


applying


style


elements


to document


structural


areas


Lara


copies


attributes


from one


place


on the


computer


display


to another


in the


same


and


other


documents.


Thi


allows


particular


group


documents


achieve


same


format


Consistency


the


format


displayed


text


allows


connection


with


the


internal


data


structure


Thus


the


internal


data


- a


structure


not


a -


dependent


on the


editing


a -









Maestro


Maestro,


also


known


Management


Environment


Structured

consisting


Text R

of tools


etrieval

to take


and


Organization,


advantage


of structural


model


knowledge


in bibliographic


data


(Macleod,


1990)


Maestro


was


developed


using


both


conceptual


modeling


and


object


oriented


philosophy.


consists


of a definitional


facility


and


query


language


handling


queries


and


updates.


The


definitional


facility


analyzes


and


constructs


second


document


that


contains a structural representation of


the original


document.


XGML


(Exoterica


Inc


., 1987)


was


the


commercial


compiler


used


this


process.


A document


developed


with


the


definitional


facility


consists of


content,


attributes,


and structure of


the


text.


The


query


language


was


developed


from


previous


work


the


author


(Macleod


and


Reuber,


1987)


on document-retrieval


systems.

language


The

that


main


objective


could


of this


naturally


process


handle


was

text


to develop


processing


applications.


Mixed


mode


document


Drocessina


system


Yamada


et al.


(1987)


present


this


system


as an extended


document


processing


model


constructing


mixed


mode


documents.


The


system


contains


both


structuring


and


layout


editing


processes.


The


structuring


process


entails


scanning









separate regions


such as characters,


figures,


and


tables.


The


layout


editing process


consists


of both


content


and


structure


editors.


An interactive


pattern


recognition


process


was


also


proposed.


The


method


consists


of both


machine-dependent


and


human-dependent


processes


that


reduce


the


psychological


load


on a human


operator


PEN


PEN


(Allen


et al.,


1981) ,


a hierarchical


document editor,


a computerized


manuscript


preparation


system


documents


containing significant mathematical notation.


The interactive


formatter provides


visual


feedback


as the


author


is typing the


document.


PEN'


unique


contribution


that


provides


notation


to simplify


mathematical


text


entry.


TEXTNET


TEXTNET


level


(Trigg


network-based


Weiser,


approach


1986)


currently


structuring


local


text


The


research


their


seen


use


studied


effects


different


on the


as an aid


text


scientific


text


organization

community.


manipulation.


strategies


However,


TEXTNET


and

has


integrates


into


one


approach


a local


network


of both


chunks


of text


document and


linked documents consisting


of on-line


literature


of scientific


nature.


The


text


stored


in a way


to make









text.


For


example,


one chunk


can


support


the


results obtained


another


chunk,


whether


the


same


or different


documents.


Other


Document


Preparation


Systems


Furuta


(1989)


examined


the


various


capabilities,


features,


and


structure


of other


document


preparation


systems


such


TEX


(Knuth,


1984),


LATEX


(Lamport,


1985),


troff


(Kernighan,


1981) ,


Scribe


(Reid,


1980)


(Unilogic,


Ltd.,


1984),


Interleaf


(Ilson,


1988) ,


MacWrite


(Apple Computer


Inc.,


1984),


XEROX'


Tioga


(Teitelman,


1984


1985) ,


MIT'


Etude


(Hammer


et al.,


1981)


(Ilson,


1980),


IBM'


Janus


(Chamberlin


al.,


1981)


(Chamberlin


et al.,


1982).


Lee


based


and


office


Malone

system


(1988)

problems


explored

Such as


solutions


s user


computer-


communication


with


different templates and document interchange between different


word


processing


programs.


The


solutions


were


based


the


development


extensions


the


Information


Lens


System


(Malone


et al.,


1987)


(Malone


et al.,


May


1987).


Appendix


provides


other


literature


software


development


and


SGML


reviewed


not


included


this


research.


parser


Other


include


software


products


products


from


(not


DocuPro,


reviewed)


Compugraphic,


using


Frame,


Scribe


(Smith,


1989b),


and


Xyvision.









Hvyertext


and


HYpermedia


Systems


Bush


(1945)


generally


credited


with


the


proposing


the


initial


based.

specific


principles


Hypertext

segments 4


which


systems


or the


current


allow


entire


one


current


hypertext


systems


reference


document


are


(link)


question


with


other


on-line


documents


variety


sequences


(Newcomb


et al


, 1991)


A true


hypertext


system


should


make


users


feel


that


they


can


move


free


y through


the


information


solely


based


on their


own


needs


(Nie


Ison,


1990)


Hypermedia


teams


create


multi-media


, text,


graphics


, sound,


and


executabl


programs)


documents


with


hyperlinks


(Newcomb


al.,


1991)


Sev


include


al.,


eral first

NLS/Augment


1973)


generation (pre

(Englebart and


(Englebart,


1984)


1980'


English


, FRESS


hypermedia

, 1968) (E3

(Meyrowitz,


systems


nglebart

1986),


Thumb


(Price,


1982),


(McCracken


and


Akscyn,


1984)


(Robertson


et al., 1981)


Conklin


(1987)


provides


a review


these


systems,


which


were


primarily


mainframe-based


systems.


Another


system


from


the


late


1960'


was


HES


(Carmody


et al.,


1969)


(van


Dam,


1988).


The


second


generation


hypertext/hypermedia


systems


were


research


oriented


systems


developed


use


on workstations.


These


included


(Akscyn


et al


, 1988)


a newer


version


, Neptune


(Delis


Schwartz,


May


1986)


(Delisle









(Meyrowitz,


1986) ,


and


NoteCards


(Halasz,


al.,


1987)


(Halasz,


1988)


(Trigg,


1988).


The


next


generation


hypertext/hypermedia


systems


are


currently


being developed for use on personal


computers.


Some


these


include


Guide


(Brown,


1987)


(Owl


International,


1986),


Hyperties


(Shneiderman,


1987a


and


1987b),


and


HyperWriter!


(Ntergaid


Inc.,


1992)


Another


investigation


of hypertext


structure


and


content


produced


Trellis


(Stotts


Furuta,


1988


and


1989),


which


allowed


the


author


specify


browsing


semantics


along


with


structure


and


content


parts


document.


Delisle


and


Schwartz


(1987),


Zellweger


(1988),


and


Trigg


(1988)


provide


other


work


basis


allowing


the


author/reader


specify


traversal


Products


have


paths

been


hypertext


developed


documents.


with hypertext


capabilities


based


on SGML.


Smith


(1989b)


outlines


three


such


products:


Idex


, a


product


Office


Workstations


Limited,


uses


hypertext


a multi-user


environment.


Optical


addressed


The


disks


the


Silversmith


output


Optical

product


medium


Publishing


Tauntc


have


been


Association.

in Engineering


combines


both


hypertext


capabilities


and


optical


disk


production.


Some


hypertext/hypermedia


systems


include


those


used









Tompa,


1989) ;


1988);


and


system


system


page-oriented


management


databases


software


(Tompa,


life-cycle


documents


(Garg


and


Scacchi,


1990).









earib I 1 s 1 b l
I jl d|I I I Ir Irr \r \ ir
ia




lii __ ____ _I i









1is1 1I I
-- -I
hull! __ _





















(Ja & jj.
1~ 1 l b
SS I S S
I J^i 'Sl 'S 'S ^





S- a 1 a -
'S 'S 'S'
-J _ __- 1 1 -1 111-I


'2.'S'S' '













II .11'S 'S \







i5I: 1 i
a -.S 2 *
I 'S I* -*|






Frf 0 -'c *.~ 4- 3 I-
-H i N I~ '







- -39



Jill ___III1~ I IIiI








a~~~~' 'S r "III



--
3U









I-I

I- I
ar aa

II _

3C1
-- ---
'S '





I~ .1
CV .5 s
a)~~ 1: -J
*r4 3 p







- -40




asa




'h 'S S










'liii ___ II




I-% I i








'hi 'S14



CIA
----
'h S' '




I)I

PdB
1 H
g S
CuB
rIL













CHAPTER


PROCEDURES


Procedures


SDecific


Objective


Number


One


The


first


objective


was


to model


the


structure


of a set


FCES


publications,


represent


the


content


the


publications


an application-independent


form.


The


first


procedure


specific objective number


one was


document


analysis


sample


FCES


publications.


Fifty


FCES


extension


fact


sheets


and


circulars


made


up the


sample


of publications.


The document


analysis


identified


the structural


properties of


the FCES publications,


including

location,


the


identification,


frequency,


naming,


interrelationshi


hierarchical

ps between e:


order,


Elements.


The


second


procedure


specific


objective


number


one


was


develop


model


(DTD)


that


provided


vocabulary


representation


document


the


analysis


structural


International


elements

ii Standa:


identified

rd ISO 88


during


79-1986


(Standard


Generalized


Markup


Language


(SGML))


was


used


define


how


structural


elements


were


placed


the


model.


The


DTD


structure


was


compared


with


other


industry


standard


DTDs


compatibility


Currently,


Association


American


Publisher'


DTDs


.e.,


book,


article,


and


serial


DTDs)


are








DTDs


were


considered


alternatives


to in-house


development


models


when


structure


was


similar.


Transferability


and


portability of


information in FCES publications were important


considerations


during


the


model


development


and


selection


process.


XGML


Validator


(Exoterica


Corporation,


1987),


commercial

conformed


software


to ISO


product,


8879-1986


was


standard


used

s and


ensure


SGML


the


syntax.


DTD


This


was


the


initial


validation


step.


Procedures


For


Specific


Objective


Number


Two


The


second


specific


objective


was


to verify


the


model


automating and testing


a process of


using


FCES publications


SGML


form


for


electronic


storage


and


delivery.


The first


procedure


specific objective number two was


to generate


based


electronic


the


files


structure


the


the


sample


FCES


FCES


model.


publications

Currently,


WordPerfect


(WordPerfect


Corporation,


1993a)


the


FCES


standard


for word


processing


software.


FCES


authors are


using


WordPerfect


(WordPerfect


Corporation,


1993a)


word


processor with an additional


pop-up


menu


(Appendix F)


to apply


both


generic


and


specific


styles


as markup.


The


pop-up


menu


and associated software


tools


(macros,


styles,


and soft


fonts)


are


known


FAST-WP,


Florida'


Authoring


Tool


WordPerfect.


objective


After


number


development


one,


FAST-WP


the


DTD


underwent


specific


eteonsi ive









model.


FAST-WP


was


then


used


to tag


the


structural


elements


the


sample


of FCES


publications.


The


second


procedure


specific


objective


two


was


convert


the


tagged


FCES


publications


into


SGML


format


(ASCII


tagged


text


In-house


software


(WP2SGML)


created


SGML


instance


(ASCII


tagged


text


file)


each


FCES


publication based


on the model developed


in specific objective


number


one.


XGML


Validator


Software


(Exoterica


Corporation,


1987)


parsed


each


FCES


instance


verify


they


conformed


the


FCES


model


The third procedure


specific


objective number two was


convert


the


fifty


FCES


instances


into


format


that


retrieval

1994), ani


systems


such


d Mulitmedia


FAIRS,


Viewer


Guide


(Microsoft


(InfoAccess,


Corporation,


Inc.,


1994)


can


understand


several


The


retrieval


conversion


systems


instances


allowed


a subjective


the


format


evaluation


the


mode 1


Each


structural


element


the


model


received


ranking


ranked


importance


on-screen


on-screen


display


display


as extremely


Elements


important,


were


somewhat


important,


or not


needed


display


. The


conversion


process


also determined


whether


the


translation


of elements


into


each


retrieval


system


format


would


automatic


manual.


The


generation


of electronic


databases


from


FCES


publications


SGML


format


(instances)


was


used


verify


that


the


DTD


was


innrnl i nna1 4 n-4 nrononon iiaofi 1


mn~iol


rle a fll 1


RPR(:













CHAPTER


MODEL


DEVELOPMENT


PROCESS


Currently,


no other


college


institution


has


a model


model


development


process


that


FCES


can


review


developing


application-independent


document


storage


for


publications.


Appendix


describes


industry


process


that


uses


generalized


markup


language


the


preparatory


step


for


application-independent


document


storage


of publications.


SGML


tutorial


(Graphic


Communications


Association,


1991)


provided


this


process


model


development


and


gave


some


direction


FCES


development


of SGML


applications.


Determine


Sample


of FCES


Publications


The


number


FCES


publications


chosen


representative


sample


set


was


fifty.


The


documents


were


selected


from


volumes


one


and


two


FAIRS


DISC.


Only


documents


created


after


June


1993


were


selected


ensure


that


most


publications


were


tagged


with


FAST-WP.


A total


eight


hundred


and


ninety-five


publications


fell


into


this


category.


Each filename


begins with


two


letters that describe


the


heading


under


which


the


publication


can


found.


The


five


digit


number


following


the


two


letters


the


actual


k L t -


I -


1% nr -


A~~~ At


I 1 ...









Pro


spreadsheet


program


(Borland


International,


Inc.,


1992)


was


chosen


to randomly


select


the


fifty


documents.


Appendix


provides


the


filenames


and


titles


that


were


randomly


selected.


Results


of Document


Analysis


on Selected


FCES


Publications


graduate


level


course


was


the


initial


setting


structural


analysis


FCES


publications


(Harrison


al.,


1992).


Members


of the


class


consisted


of mostly


computer


and


communication


specialists.


The


class


was


divided


into


two


working


groups,


with


each


group


having


an identical


subset


FCES publications.


Both


groups did


a document


analysis


on the


FCES


publications


wrote


tree


diagram


to represent


the


structure.


An attempt


was


made


to develop


one


tree


diagram


from


the


structures


both


trees.


Generally,


each


group


preferred


their


own


description


the


FCES


publication


structure.


Keeping


both


tree


structures


would


result


the


development


two


models


represent


the


same


FCES


publications


Describing


FCES


publication


structure


two


formats


would


prevent


one


group


from


interpreting


the


others'


information.


The


authors'


experience


with


the


class


showed


that


structural


specifications


specific


informational


areas


are


open


varying


interpretations.


Usually,


the


interpretation of


structure


in FCES publications was dependent


a a


l


..









this


point


the


author


did


a more


extensive


document


analysis


on the


of FCES publications.


Structural


areas


each


document


were


broken


down


to the


textual


(lowest)


level,


and


definitions were given


each area


of information.


Each


definition


was


information


based


the


review


set


FCES


identical


publications.


areas


Further


refinements


included


the


relating


text


back


to its


parent


structural


elements,


merging


similar


structural


areas,


deleting


of redundant


structural


areas,


and


restructuring


and


simplification


of structure.


The


primary


organization


of the


first


level


of structure


was


a front


matter


and


body


matter.


These two main elements


allow all


other


informational


elements


attach


some


level


the


structure.


The


tree


diagram


and


structural


definitions


were


the


basis


the


FCES


model


development.


Model


(DTD)


Development


Selection


important


this


phase


decide


whether


one


should


use


custom


model


private


use


modify


existing


model


the


marketplace.


Modifying


existing


model


(DTD)


the


marketplace


could


increase


access


the


public

in FCES


information


publications


FCES


publications.


can be accessed


and


The


information


used by those


systems


that


can


read


and


interpret


the


existing


model.


Removal


-ln .


- -


a *


* a -. .


L


-~ ~ ., .L


L_. -


II









the


Association


of American


Publishers


(1987)


models


are


the


only


models


recognized


American


National


Standards


Institute


(ANSI)


standard.


The


AAP models


are


entitled


Book,


Article,


and


Serial.


A comparative analysis of


the AAP models


with


the


FCES


tree


diagram


structural


definitions


were


initiated.


Upon


review,


the


AAP


Article


DTD


most


closely


resembled


the


structural


properties


and


definitions


information


the


FCES


publications.


Appendix


lists


the


tree


the


diagrams


AAP


that


Article


represent


model


FCES


some


publication


unique


structure


elements.


using


Figure


shows


the


current


FCES


model.


The


first


set


of elements


unique


to the


FCES


publication


model


were


hyperlink


(hyp),


link


word


wordr)


and


action


(act).


The


AAP


Article


model


does


provide


way


reference


a hyperlink


publications.


example,


tagging


"Figure


1" in


the


text


of an FCES


publication


as the hyperlink


link


action


word


(act)


wordr)


on.


that


Two


the


retrieval


attributes


were


software


defined


can


the


some


element


act


to describe


the


type


action


be performed


on each


hyperlink.


The


first


attribute


was


named


"type.


The


attribute


values


"type"


could


text,


graphic,


executable


second


program,


attribute


audio,


table


element


was


data


named


record.


"descr.


The


This


attribute


stores


in an FCES


instance


the


information


that


was









elements


were


placed


as an inclusion


within


the


content


model


the


article


element.


Thi


allows


hyper link


occur


anywhere


within


an FCES


publication.


The second set


elements unique to the FCES publication


model


were


the


rectangular


coordinates


simple


and


complex


tables


(tdim)


The


elements


provided


way


describe


the


row


height


(rowhgt)


and


column


width


(colwid)


inches


each


row/column


combination


simple


and


complex


tables.


Their


main


purpose


was


simplify


computation


table


dimensions.


The


row


height


and


column


width


were


listed


comma


delimited


numbers


1200ths


an inch,


and


described


columns


and


rows


a table.


Five attributes were added to complex table header


(cth),


simple


table


cell


complex


table


cells


(cte)


retrieval


software


displaying


tabl


on-screen.


The


first


attribute


(shaded)


described


the


element


was


shaded


or not


(y/n) .


The


remaining


attributes


were


named


the


top


line


(topline),


bottom


line


(botline)


, left


line


(Iftline)


and


right


line


(rgtline)


that


surrounds


each


the


three


elements.


The


four


attributes


were


given


the


same


attribute


value


name


entitled


"name.


The


name


defined


whether


each


line


surrounding


the


three


elements


was


(n) one,


s) ingle,


(d) ouble,


d(a) shed,


d (o) tted,


(t)hick


(e) xtra


thick.


Appendix


E defines


the


elements,


attributes


and


entiti











CEDOCrYPEE


aiticle


< ELEMENT
< ELEMENT
< ELEMENT
< ELEMENT
ELEMENTT
ELEMENTNT
< !ELEMENT
< ELEMENT
< ELEMENT
< !ELEMENT
< !ELEMENT
< ELEMENT
< ELEMENT
< !ELEMENT
< ELEMENT
< ELEMENT
< !ELEMENT
< !ELEMENT
< !ELEMENT
< ELEMENT
< !ATTLIST


< !ELEMENT
< !ELEMENT
< ELEMENT
< ELEMENT
< !ELEMENT
< ELEMENT


article
fmn
tig
atl
(itlb|el[e2le3)
fgr
f
(suplidf)
mu
(onm m)
puhfm
(aidj ism)
avl
cit
crd
abs
(moday Iyr)
h
bdy


(fm, bdy)
(tig. au*, pubf
(tad)>
(#PCDATA I ((
(IPCDATA I ((
(#PCDATA I ((
(suplin)>
(#PCDATA)>


+(figl[fihyp)>
m?, abs*)>


itl bel
itl bel
itlblel


e3)lf fgr))*>

e3)flfar))*>


(smm)>
(#PCDATA ((it b.el Ie21c3) f fgr))>
((crt avi) (aid irimn))>
(#PCDATA I((it b:el e2 e3) f fgr))* >
(onm)>
(crd)>
(mo?, day?, yr)>
(h?, p, (pitbl ctb)10112113)10 f)*)>
(#PCDATA)>
(#PCDATA |((it b|el| e2le3) fl fgr))*>
(sec)+>


(st,(pl(lblctbl) I
(st,(pl(tbl|ctbl) I
(st,(p I(l Icthl)|
(st,(p|(thblctbl) I
(NPCDATAI ((it
(#PCDATA ((it


ml
ss2
ss3
st
P
lopt
(11112113)
Ih
li
fig
Fn


|l3)!f!
|13) fl
113) If
113) If|
c21c3)
e21e3)


topl)*, ssl*)>
topl)*, s2*)>
topl)*, 3 *)>
topl)*) >
I f fgr))*>
Ifgr)l(thl ctbl)0(lt|121U3)If)*>


(h?,p.(p[Obl ctbl)I (1l 112|13)If)*)>
(lh?, li*)>
(NPCDATA I((it b c 2c 3)Ifl fgr))* >
(p.(p, (tbl cthl) 11 1213) f)*) >
EMPTY>
(p,(p i (thli ctbl)' ( 112113)1f)*)-(fig I fnt) >


wordr, act) -(fig f I| hyp)>
(#PCDATA)>
(#PCDATA)>
(text I bitmap 1 exeprgm audio I table datarec)#REQUIRED
CDATA #REQUIRED >
(no?,tttdim?t.lby)-(figf tni tbl ctbl)>
(#PCDATA)>
(#PCDATA I((it ble Ile21 e3) f fgr))*>
(th*, t h*, row*)>
(tsb?. c*)>


(thlt~t)
(fabihc)


(#PCDATA ((it I b l 12
(p.(p 1(ll01213)' f)*)>
shaded (yln)
topline NAME
hotline NAME
Iftline NAME


rgtlinc


NAME
(s)inglc,


e3)|flfgr))*>

#REQUIRED
#REQUIRED
#REQUIRED
#REQUIRED
#REQUIRED>


doublel.




< !ELEMENT ctbl (cthd,ctby,clhf) -(fig fn Ithl ctbl)>


Fiqure


-- The


FCES


Model











< ELEMENT
< ELEMENT
< ELEMENT

< !ATrIIST


ctbd
Stt
tdim
(colwid | rowhat))
cthr
(ac |ctc)
(cabi tsbl)
cth


(no?,ctt?,tdim?,cth r)>
((itlb|ll e21e3) I|fl#PCDATA)*>
(colwid, rowbht)>
(DPCDATA)>
(crth)>


(pI(l I
@(p1(II
align
vd~ign
oh
cc
atf
V.j
dnded


topline
botlina
Iftline
rgtline
<1- The NAME defines whether there is (n)o line.
< !- right, and bottom of each cell >
< !ATTLIST ctsbl alin


C JAITJJST


valign
align
valign

cb


rb
ra
shaded
lopline
hotline
Ifiline
rglline


< ELEMENT
ELEMENTT
< ~ELEMENT
< ENTITY
< ENTITY
IDENTITYY
< ENTITY
ENTITYY


1213) f)*>
1213) 10*>
(l|clr|d)
(tl|mb)
NUMBER
NUMBER
NUMBER
NUMBER


(y!)
NAME
NAME
NAME
NAME
singlel. (d)ouble
at the top, left. -


(lictrld)
(tIlmb)
(llclrd)
(tl mh)
NUMBER
NUMBER
NUMBER
NUMBER
(yin)
NAME
NAME
NAME
NAME


#IMPLIED
IM PLIED
#IMPLIED
#IMPLIED
#IMPLIED
IMPLIED
#REQUIRED
#REQUIRED
IREQUmED
#REQUIRED
#REQUIRED>
->


#IMPLIED
#IMPUED>
#IMPLIED
#IMPLIED
IMPLIEDD
#IMPLIED
#IMPLIED
#1M PLIED
#REQUIRED
#REQUIRED
#REQUIRED
#REQUIRED
REQUIRED >


singlel, (d)oublc, >


d(o)ttled, thickc, or (e)xtra-thick line at the top, left,
bottom of each cell >


(ctr)*>
(cisbl?,cte*) >
(clc)>


PC,>)


saCUte




lEacute
'Acute
hacte
uacutc


SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA


ouWDI


"]" >
"[aacutel" >
"[Aacute]">
"[auml]" >
"(Auml] >
cuteute" >
"[EacutcJ'>
"l|iacute" >

acuteute" >
"[Oacute'>
"[ouml]'>


Fiaure


-- continued.










< ENTITY
ENTITYY
< ENTITY
< ENTITY
ENTITYY
ENTITYY
ENTITYY
ENTITYY
< ENTITY
< !ENTITY
< !ENTITY
ENTITYY
< !ENTITY
< ENTITY
< ENTITY
< ENTITY
< ENTITY
< ENTITY
ENTITYY
ENTITYY
< ENTITY
< !ENTITY
ENTITYY


Ouml
acute
Uacute
uuml
Uuml
bull
squf
diams
cent
check
copy
dash
mdash
deg
prime
Prime
female
agr
bgr
dgr
Dgr
igr
sgr
Sgr
male
minus
times
divide
plusmn
le
ge
ne
fracl2
fracl3
fracl4
frac23
frac34
Ntilde
ntilde
Idquo
rdquo
reg
trade
quest
iexcl
laquo
raquo


SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA
SDATA


"[Ouml]">
"[uacutel'>
*[Uacuterl>
i[uuml">
[Uuml]" >
([bullr>
'[squf]">
'Idiams]">
"[centj">
*[check]" >
'[copy]">
"Indash]" >
Imdash]">
"[deg]'>
"[prime] >
"[Prime]" >
[ female]">
[agr]" >
'rbgrl">
'[dgr]">
"[Dgr]">
"[mgr]" >
"[sgr]">
"lISgr] >
"[malel">
"lminusl">
"[times] >
'[divide]" >
"[plusnmnl' >
"[le]">
"[ge]">
"[ne]" >
"[fracl2]">
"[fracl3" >
*[fracl4]">
*[frac23' >
'[frac34]" >
'[Ntilde] >
"lntildeI">
"[ldquo" >
"[rdquo]" >
"[red">
[tIrade]>
'liquest]">
'(ieKcl]">
"laquo]" >
"lraquoJ">


Figure


-- continued.









The AAP


Article model provides opportunities to represent


multiple


elements


the


FCES


publication


model.


One


such


opportunity


the


provis


the


AAP


Article


model


five


different


types


lists.


The


FCES


model


requires


three


different


lists.


The


FCES


model


consists


of bulleted


(11),

also


unmarked

provides


(12)

for i


and


three


enumerated(13)


types


lists.


of emphasis


The


(i.e.,


AAP

el,


model


and


other


than


elements


such


bold


and


italics


(it)


Currently


the


(scientific


FCES model


name)


and


includes


(reference


(not

title)


currently


used),


as emphasis


types.


The superscript and subscript elements


were not


initially


found


the


required


main


the


FCES


part


model.


the

The


AAP

two


Article

elements


model,


are


but

the


are

AAP


math


model,


which


part


the


AAP


Article


model.


geometric


formula


consists


superscript


and


subscript


elements


the


AAP


Math


model.


Adding


the


geometric


formula


element


the


FCES


model


the


structural


locations


specified


the


AAP


Math


model


allowed


the


addition


superscript


and


subscript.


Table


-1 provides


a line-by-line


explanation


the


FCES


model,


including


the


new


additions.












U C.1
FI

us I, I ,


I I 4823[
Cl
oE 1!.
o I 1 5
o K1

SB A S t



g2 -i *1 1


Zrr
I -I 8Is AdB S !
I
,4 f i iPi 1 B

0 ID.a' PB e
C _ _
0 .Ir
Hz




'C H

*1
-: l

en( a- -
0 I







5 4


.- I I a


:1 '1IvU

-: I


I II I a

I
is U I I I
IW 1 .1 Ii




iil 'U]1





a 31
.3

a a Pa
-s -
I$11
C1~~~ -, ,- % .
---- 4f8 d
AB A
'0I A'
ii e szn ecasn ~aA




Ca E
o C_ a


isil -
i I s I~ l I


























I -







-55





IJ I I i*I
'U( 'U iii
U B I


0 J I
a

4) 1 iE d-it


SlS 1 S i




IJ rI
11j 1 I I



-~ .- --C9



0'I 3ilI A

J ~Li
SS A A A 1 ra
u no



v- -Z A
C -
.c 10a3 0






56


I~ I


I I

E: SI

El' Ii


of t ii
Ii II gi~
e -c Ne 1

L I *1 Si iei
IJi [It
~4~ B i '
l ip l i



us 1 i IJi
II-.


11 14 1 -b It I iA I
.4. K SI





57


a-
I I5~ It
.5 a, .5 1 .
.51 -B-
$1ll .5
-. ii Ii
a I 4

I~C '111

jl II U
I I tI
'4 Ji .5; ii
p 'l ft f] lU
4W =
5 krii i i
dul Ia 1 i a

ii I i j
-'---Sfe _____ dle.a 1 1 It I


I -' A %i ief I
'H a1 -1B;


C
I U
SIC eL'
4rl r
4, a- a a
ac *1 I- L ia






-58
1 I 1 v I'


4 I i Is
o9 .1




w a &
I< 8 I I1 .t U S
I S< a iH l

Sa .
I I ~ it



1 1 1 8 g
C lI I .


j i j ^ S
QI



j l -in I
"si l I I *10i"- a -J
r> -2 P a P a a -
'At I 3i i

-& t i 4 Ii I1i







C l *8 S a 2 ^^ -9 I a
O c v a I
I3 IgH
2 '
1i1 1 i I B
eS k g l i ^ a-i l' g B .9 B tf^ l' g S S*2 u i -'
Iscg (Ss-g ssg g i
Sm *.'ii J



jJfts jj'f 1111 1 a1
111 l iisii U In iiii






0 C S B _____
-
A -. .^ I'

O i3 .. ...&Is ?I
-l A
2 A I
C~ a






__ ____ __ -59



I I I;

U '5
I(I i I I
o a Aa1
E:I 3m .i I A
~ a
C -% 4~r B :
SI lim I

*0t .1 i
*: a

I. .
I Ba
oi *c liii I 'U
Ai i iif
41 ip a Il b
I 1 .5

it6 I. I.i
~~Hi~ IlIll


ssj si j

I I a a

etfl> FSF


a)
ct VI I -lI !
0
4.3 2 Si S
-






60
I a 3 '

fl 1 U I -:
a h 1
Xl UL
1111 1] 1

a I

"1 hi I
.1 CI S
0~~ Ii as .

o p 41ii

I-' I I 'U

rla 1 1I'
*1 i 1 I a j
1. jP. O.e 1) 3
I I I I I' a Ul
1111 1.11 ii lih lii ii a
4~~ 4 4 hi I ii '. .
.r hi .d$ II ii .- 1

I I IBII 8 It ii B
a a a a s Ii Haf Ii5 %Ii h
l als_ 1 t B SPi# _ _ I dca _ _ _

i i -
--- -I _

OH1 a

t~ t.
ar 4
.0 1 &
3l _
t 4






'-U. .mE-. - U U -


I

a
I
I
I.

ii

'a I
All
Jiir
~cif
Agi


I,

U'
S

I



Ii
I

I~

'1
t U1


a

fj.3
eaI~l


Ie




t B I
I'E

I ia
.3I 'a l
U d
A s t



















* U


UI .W I I U SPS I U -~ **** =3CCP C U U -









With


exception


of hyperlink


elements


and


column


and


row


elements


simple


and


complex


tables,


the


model


subset


the


AAP


Article


model


Having


the


FCES


DTD


subset


the


AAP


Article


model


provides


three


benefits


FCES


publications.


First,


little


customization


needed


convert


the


information


FCES


publications


to ANSI


standard


format.


Second,


the


information


FCES


publications


has


the


benefit


terminology


and


names


that


are


widely


used


the


United


States.


Third,


FCES


model


provides


potential


portability


FCES


publications


systems


that


read


information


AAP


Article


model


format.


The


basis


the


design


development


the


FCES


model


was


to allow


information


FCES


publications


to be


described


software.


a form i

process


independentt


must


any


computer


established


that


hardware

verifies


and

the


model


application-independent


storage


FCES


publications.













CHAPTER


MODEL


VERIFICATION


FCES


Publication


Preparation


and


Conversion


SGML


Format


Identification


of Model


Elements


FCES


Publications


An electronic


toolkit,


known


as FAST-WP


, was developed at


the


University


of Florida


to make


easy


authors


and


word


processors


add


special


codes


WordPerfect


(WordPerfect


Corporation,


1993a)


files


running


under


DOS


(Cilley


and


Watson,


1992a


and


1992b)


The


special


codes,


WordPerfect


styles


with


generic


stylenames,


were


used


define

were D


structural


laced


areas


FCES


within


FCES


publications


via


publications.


a pop-up


The


menu


styles


(Appendix


FAST-WP


was


modified


after


the


FCES


DTD


was


developed


include generic


styles


that reflect


the structural


elements


the


model.


The


author


then


used


FAST-WP


apply


generic


styles


the


subset


FCES


publications


During


thi


tagging


process


FAST-WP


was


tested,


edited


and


updated


problems


were


encountered.


Appendix


provides


table


describing


the


relationships


between


the


model


elements


and


their


generic


styles


that


were


placed


FCES


publications.









Conversion


of FCES


Publications


Into


SGML


Format


computer


program,


WP2 SGML


(WordPerfect-to-SGML),


was


written


to generate


SGML


instances


from


the


FCES


publications


(Harrison


et al.,


1992).


Currently,


WP2SGML


converts


an FCES


publication


into


an ASCII


file


with


start


and


tags


that


describe


structure.


WP2SGML


was


written


C++,


using


object


oriented


features


of the


language.


Each


style


has


own


object


conversion


from


WordPerfect


SGML.


The


objects


have


inheritance


objects


can


processed


general o:

generated


r specific


WP2SGM:


level.

L were


Instances

validated


(tagged


FCES


documents)


conformance


the


FCES


document


model


(DTD)


using


Exoterica'


XGML


Validator


software


(Version


1991) .


Initially,


some


documents


did


not


conform


the


FCES


model


errors


in the


logic


WP2SGML,


errors


structural


the


problems


author.


the


Editing,


FCES


testing,


model,


tagging


evaluating,


and


updating


the


FCES


DTD


was


done


when


structural


differences


were


found


tagged


FCES


documents.


Many


tags


are


placed


with


WordPerfect


styles.


Certain


styles


WordPerfect,


such


as footnot


figures,


and


tables


are readily


apparent


in WordPerfect by the conversion program.


The


covers


program


uses


the


codes


embedded


the


WordPerfect


file


detect


where,


example,


footnote


begins


and


ends, ar


where


should


>e placed.


t then









difficult


conversion


the


paragraph,


where


the


program


requires


the


recognition


of hard


returns


correctly


detect


the


beginning


and


ending


of a paragraph.


Once


the


beginning


and end


of a paragraph are detected,


the software


places


start


and


end


tags


into


the


ASCII


file.


There


are


several


structural


elements


that are difficult


to detect


in a standard


WordPerfect


document.


These


elements


must


tagged


with


styles


identification so the convers


ion program can detect


them.


The


pop


menu


used


tag


those


structural


elements,


such


as heading


levels


and


hyperlinks.


Conversion


of FCES


Instances


Into


Retrieval


Format


Each


element


was


ranked


importance


on-screen


display before the conversion


of FCES


instances


into retrieval


system


format.


The


author


used


the


rankings


starting


point


might


evaluating


have


possible


delivering


limitations


specific


a retrieval


information


system


FCES


publications


A retrieval


software could be deemed


inadequate


when


extremely


important


elements


cannot


displayed


on-


screen.


The


initial


ranking


also


provided


the


author


a way to


justify whether retrieval software can adequately display


FCES


information


or full


on-screen


screen


via


display


different


of hyperlink


methods


material)


pop


The


box


ranking


level


for the elements


were extremely


important


(A) ,


somewhat


- -C


1 .


i


m


r 11









somewhat


important


and


fourteen


that


were


not


needed


display.


Each


retrieval


system


requires


different


processes


successfully


translate


FCES


instan


ces


into


format


understands.


A brief


description


the


processes


each


retrieval


system


as follows.


Candide


The


Semantic Data Modellina


Lancuaae


For


FAIRS


DISCS


And


DISC


Query


searches


tagged


documents


that


are


stored


themselves


a database


are


cumbersome


ineffi


cient


(Beck


and


Watson,


storing


both


1992

the


Candide


structure


was

and


designed


content


specifically


semantic)


for

FCES


publications.


Candide


a database


management


system


that


has


data


storage,


model


retrieval


(Candide)


query


used


facilities.


decompose


The


each


semantic


publication


into


objects


(Beck


et al


, 1989a)


The


objects


represent


meaning


words


and


can


interact


with


each


other


to represent


complex


data


(Beck


, 1989b)


document


can


represented


using


these


objects.


Candide


differs


from


other


semanti


data


models


uniform


treatment


of data


objects


, query


objects


and


view


objects.


Candide


provides


query


searches


about


the


structural


relationships


between


objects.









FAIRS


CD-ROM


DISC


FCES


instances


are


grouped


together


handbooks


topics.


program


called


XTRAN


(Exoterica,


1990)


uses


a set


rules


to convert


the SGML


instances


into an object-oriented


format


similar


(fname.out)


PROLOG.


determines


their


The


structure


translation


each


(CDM)


into


file


the


Candide


database


(Beck


et al.,


1989a).


These


(fname.asc)


Candide


using


files


DB2ASC,


are


where


converted


each


ASCII


individual


files


object


chunk


within


a file


was


treated


as a single


ASCII


file.


This


provides


ASCII


chunks


that


link


together


through


hyperlinks


within


the


SGML


instance.


Editors


then


review


the


chunks


formatting


errors


and


prepare


the


ASCII


files


(chunks)


into


format


on-screen


display.


Tables


represent


major


problem


encountered


editors


and


translation.


The


editors


had


chunk


tables


hand


because XTRAN


did not have


sufficient rules


to support tables.


Chunking


involves


breaking


down


document


into


smaller


pieces.


Tables


were


extracted


from


the


WordPerfect


file


and


saved


as an ASCII


file.


Table


extraction


allows


simultaneous


editing


tables


and


other


chunks


from an


SGML


instance.


The


final

step


procedure


translation


two-step


ASCII


process

files


(TXT2OBJ).

(fname.asc)


The

into


first

text


files.


removes


the


hard


returns


and


places


special


.









step.


After


this


conversion,


the


objects


are


placed


into


retrievable


database


such


DISCS.


The


database


includes


both


data


files


and


index


The


index


file


table


reference


each


chunk


information


DISCS.


DISCS


production


screen


relied


display


on editors


of documents.


and chunkir

The SGML


procedures


elements


for

the


on-

FCES


model


were


not


used


the


process.


shell


program


(WP2DB)


was


written


create,


write


script,


and


run


batch


files


(three)


placing


WordPerfect


files


into


a Candide


database.


Each


step


the


process


has


own


utility.


A failure


step


stops


processing


that


document.


The


first


batch


file


reads


WordPerfect


files,


then


converts


them


into


SGML


format


(fname.sgm)


using


WP2SGML.


Tables


were


extracted


saved


ASCII


format


before


this


process.


The


shell


then


ensures


each


instance


a valid


SGML


document.


The


second


batch


file


reads


the


SGML


instances


creates


command


line


parameters


XTRAN.


runs


XTRAN


and


converts


each


instance


into


objects


(objects.out)


The


third


batch


file


reads


the


fname.out


file


structure


and


initial


zes


parameters


object


format


CDM.


CDI'


translates


object


files


into


Candide


database


format


and


places


them


database.


Figure


describes


the


production


process


DISC.














































Figure


5-1 -- The production process for DISC8


FAIRS


CD-ROM


DISC9


The


significant


difference


the


development


procedures


for


DISCS


and


DISC9


was


the


use


the


explic


structure


the


FCES


instances


(SGML


elements


, a


parser


was


developed


to make


a one


-step


process


out


of the


XTRAN


and


CDM


a a a


I Ill


II


1


-


*









parser


has


one


grammar


rule


each


SGML


element.


The


grammar rules


embody the Way


of analy


zing the document


that


being


parsed.


Along


with


the


rule


template


with


the


Candide


object


specific


SGML


element


Each


template


filled


with


actual


data


that


produced


the


object.


The


parser


uses


grammar


rules


translate


the


input


string


and


place


the data


into


a template.


The


template


produces


the


data


object


that


put


into


the


Candide


database.


The


rules


and


they


templates


will


the


appear


chart


the


parser


Candide


show


database.


elements


The


and


grammar


how


rules


from the


chart


parser


are


also


stored


the Candide database.


Figure


describes


the


production


process


DISC9.


- n|& as a S i a -- i -








Multimedia


Viewer


Microsoft


Multimedia


Viewer


(1994) ,


framed-based


retrieval software that reads


files


in Rich


Text Format


(RTF).


All


files


created


from


SGML


instances


must


saved


RTF


files


(ie.,


topic


files).


RTF statements,


which are specially


formatted


tags


that


specify


a particular


type


formatting


information,


are


presented


these


topic


files.


Multiple


topic


files


are


grouped


together


the


Viewer


project


file.


Other


elements


a topic


file


include


destinations,


control


symbols,


and


groups.


Both


font


and


color


tables


can


developed


to define


topic


information.


The


key


process


here


that


Multimedia


Viewer


provides


the


formatting,


while


the


user


provides


the


structure.


A WordPerfect


macro


(SGML2RTF)


was


written


automate


the


process


from


SGML


instance


Multimedia


format.


The


process


is as follows:


Put


all


instances


graphics


converted


into


directory.


Select


that


directory


conversion.


Read


"factshts.tag" file


elements


to be considered.


global


parameters.


Retrieve


footnote


elements.


Copy


Copy


footnote


elements


sections





to utility


to utility


document.


document.


Create


frame


per


document


level


(section


,









Format


any


elements


(e.g.,


lists,


topic


paragraphs,


paragraphs)


each


section.


Write


out


file.


Generate


the


table


contents


entry


the


current


instance.


Next


instance


goes


through


the


conversion


process.


The


macro


ends.


Use


Multimedia


Viewer


to create


the


fname.mvb


file


from


the


project


file,


graphic


files,


RTF


files.


Guide


Guide


(InfoAccess,


1994)


electronic


publishing


system


that


designed


around


an object-oriented


information


model.


The


model


presents


information


as a series


linked


objects


and


manages


relationships


between


them.


All


document


components,


represented


from


single


as an object.


word


Once


graphic,


each


object


can


defined


command,


can


linked


to other


objects.


Guide


provides


live


or hot


objects


to be activated


with


the


mouse


using


reference,


provides


command


expansion,


scripting


buttons.


note,


language


Guide


command


(LOGiiX)


differs


buttons.


write


from Multimedia


Guide


definitions


viewer


that


provides


the


structure


on-screen


display,


while


the


user


provides


how


will


be displayed


(formatting).


-a ~~~ V~~a S -- ~n -,,,, -- --- -..-aa A a aaa'e4


S
V.. *- A- A- a


LI I~ILtl*UC


LLI


1


MUALU*II


I ~~ LI








the SGML instances


into HML


(hypertext media


language)


format.


The


steps


the


program


are


as follows:


Read


and


parse


an instance


(fname


.sgm).


The first pass through each instance picks up


information


necessary


for


the


conversion,


as well


as determining


any


tables


are


present.


Convert


any


elements


instance


to HML.


Write


out


the


HML


file.


AFter conversion


to HML,


the


instances are then converted


into


Guide


(fname


.hml


to fname


.gui)


format


as follows:


Develop


style


file


formatting


purposes


Run


Guide


Writer


on the


instance


converted


to HML


format


namee


.hml).


Develop


style


file


tabi


with


TMF


extension


are


generated


each


table.


Table


Viewer


software


takes


each


name


tmf


(text)


input


and


displays


the


table


on-screen


(runtime)


Results


of FCES


Instances


Converted


to Retrieval


Format


Convertina


FCES


Instances


into


FAIRS


Retrieval


Format


Table


provides


the


evaluation


converting


FCES


instances


into


FAIRS


retrieval


format


DISC8


and


DISC9.


The DISC


process relied


on editors and


chunking procedures


manually prepare


the


information


on-screen display


. There









instances.


All


the


elements


were


automatically


translated


the


FAIRS


retrieval


format


because


the


chart


translator


creates


a rule and


template with


an object


each


element


the


model


The


objects


were


then


translated


placed


into


the


FAIRS


retrievable


database.


The


translation


the


objects and rules directly


into the retrievable database


automatic


process


elements


the


FCES


model


Table


shows


that


the


translation


of elements


into


DISCS


format


was


a manual


process,


while


translation


to DISC


format


was


a totally


automated


process.


Converting


FCES


Instances


into


Multimedia


Viewer


Format


Table


provides


some


translation


comments


the


conversion of


FCES


instances to


Multimedia


Viewer


format.


The


process


from


FCES


instances


to Multimedia


Viewer


format


was


primarily


automated


process.


Tabl


shows


that


superscript


and


subscript


were


only


elements


that


Multimedia


Viewer


could


not


translate


retrieval


system


format.


Converting


FCES


Instances


into


Guide


Format


Table


provides


some


translation


comments


the


conversion


FCES


instances


Guide


format,


which


was


automated


process.


Table


shows


that


Guide could


translate


- A-. 2 ~~~~31, A..., ,, -.-A .............. --A-- -


'I ,,,,I


LL


1,.,,,L






76
d*


3 .1
.9


II
I I8
S ii






|5 W g 5
1. 1




l 1 sI
S.8 9 ..
Aj
I Ul U :

1 a S
ii Ii 1 1

ii-
I jil iI

v i ,z z t z a : 1
O -

A I -I i I l lli
I-H










""
s l l l l









6~~ .a .& T-
i --- ------ --- -- -- --
184 1.
11 -d -i - -I

11S S0 -, -- -^ -" S -" -"wr
3 f" 3
9 i I nl m. - ci c-- -" w
I~.


- a- --8
-- a- - -

jig,
IIi
0 g I N N N N Cl N N N N N t N 1N N



Z '3 aK Z Z < a z z 0 a 4 Z 0l B 2 < 4 Z 4 a Z
i _
-






































































































- -.-




1?


-r -1 -1) -I -I -I -1 a11 I I I1


Z B Z < < Z < C C C C C C C C 0 BB

-- a -- -


* US U N U U U N U U U I U I N U


_


__






____ 78


Fd 'I1
II II\
*1* I
S, j


.1 B IS~


*1 a 1 .9i
c g e

12 3



a -; 2 .1.B
.r *1- k






-- -I -I -I





Ii Cl 0 0







- -79

a '6a



I-



ii I I i

*~ ~~ S -
j SI

I -
J A Z
1C3 4'.
I it I 8
13 1
II 0 1 .r

E~Ig I ~ J
Ii a N, I

d m1 .1 .n31,
II1I i~i I t I


ii I -i ii



S i ii Js aj
I- -

I;" I. I C I1
o 1i a -
I -*CS.

J A












InterDretation


of Elements


by Automated


Retrieval


Systems


Software


was


written


to automate


the


translation


of elements


into


each


retrieval


system


format


(DISC9,


Multimedia


Viewer


and


Guide)


on-screen


display.


All


the


extremely


important


elements

of the s

Guide re


were


somewhat

itrieval


translated


important

systems.


to the


elements


The


three


were


Mulitmedia


retrieval

translate<


Viewer


systems.

I to DISC9


retrieval


All

and


system


can


not


display


superscript


or subscript


characters.


Possible


Model


Chances


Simplification


the


model


can


reduce


both


the


knowledge


authors


need


use


FAST-WP


redundancy


elements


FCES


instances.


There


are


many


elements


that


"ride"


along


multiple


content


models


holders


other


elements.


Some


these


elements


might


include


front


matter


(fm),


title


group


(tig),


figure


reference


(fgr),


geometric


formula


(f),


author


(au),


publishers

copyright


front


notice


matter


date


group


(crd),


(pubfm),


complex


copyright


table head


not:


(cthd),


ice (crt),

row/column


dimensions


simple


and


complex


tables


(tdim),


and


complex


table


header


(cth).


The


content


models


each


of these


elements


could


be represented


themselves


in the


model.













CHAPTER


SUMMARY,


CONCLUSIONS,


AND


RECOMMENDATIONS


Summary


The objective


of this


research


was to model


the structure


and


represent


the


information


FCES


publications


electronic


form


that


independent


any


specific


computer


hardware


software.


The


initial


research


began


with


graduate course


in SGML principles


practi


ces


that


provided


starting


point


the


author'


document


analysis


on FCES


publications.


Document


analyst


proved


ongoing


process


throughout


the


research.


The


ass


also


illustrated


the


difficulty


of attaining


agreement


on a model


a set


FCES


publications


from


two


more


groups.


forestall


possible


disagreements


on FCES


document


structure,


the


author


proposed


to adopt


an outside


standard


rather


than


develop


in-house model.


After


review


of current


American


publication


models


, coupled


with


author'


document


analy


S1S


, the


author

Article


selected

model


Association


as the


best


of American


representation


Publisher'


the


(AAP)


structure


FCES publications.


The


author


initially


developed a


subset


the


AAP


Article


model


later


added


structural


elements


unique


FCES


publications


(Appendix


Removal


the









unique


elements


would


produce


a model


compatible


with


the AAP


Article


DTD.


An in-house publishing tool


(FAST-WP)


was


based,


in part,


on the


FCES model


developed by the


author.


The generic


styles


FAST-WP


reflect


the


structure


the


FCES


model.


The


author


used


FAST-WP


as the


publishing


tool


to tag


the


subset


of FCES


publications.


Once

software


the


FCES


(WP2SGML)


publications


was


developed


were


tagged,


to convert


each


in-house


publication


into


SGML


format


(instance).


The


author


helped


develop


the


structural


conversions


that


WP2SGML


uses


the


conversion


process.


These


structural


conversions


were


based


on the


FCES


model.


The


author then


used


a parser to


verify that


each FCES


instance conformed


to the


8879-1986


standards and


the FCES


model.


This


verification


proved


that


FCES model


describes


the


content


of FCES


publications.


After


conversion


FCES


tagged


publications


into


instances,


element


the


the


author


FCES


ranked


model


the


priority


on-screen


level


display.


The


each


author


then developed how the


elements


the


FCES model


would appear


on-screen.


Software


was


then


written


convert


FCES


instances

format.


into

All


Multimedia


elements


Viewer


the


and


FCES


Guide


model


retrieval


were


system


automatically


translated


to FAIRS


DISC


and


Guide


retrieval


system


format.









format.


The


automatic


translation


FCES


publications


SGML


format


FAIRS


DISC,


Multimedia


Viewer


and


Guide


retrieval


formats


verify


that


the


FCES


model


application-


independent.


Conclusions/Findinas


After


converting


a set


of FCES


publications


into


several


retrieval system formats the


following conclusions were drawn:


SGML


was


suitable


method


rules


and


syntax


modeling


FCES


publications.


The


FCES


model


was


validated


based


on ISO


standard


8879-


1986.


The model


provided the structure for automatic conversion


of FCES


instances


into


retrieval


system


format.


The


model


was


verified


suitable


way


of describing


information


FCES


publications


the


automatic


conversion


FCES


instances


into


retrieval


system


format.


Observations


Other

project,


findings


but


that


observed


were


during


not

the


part


process


the


research


include


the


following:


Document


analysis


an ongoing


process


that


affects


the









preparation


time


and


the


degree


automation


the


conversion


process.


Prior


development


the


DTD,


decide


who


has


the


authority


to change


the


SGML


model


(DTD)


Application-


independent


document


storage


revolves


around


the


DTD.


The


DTD


the


single


most


important


aspect


any


SGML


project.


Previously


converted


publications


must


revalidated


when


there


are


any


changes


the


model.


When


developing


the


FCES


model,


the


more


rigid


the


structure


the


eas


the


implementation.


Use an


existing model


as a starting point


to describe


the


structure of


publications.


A subset


of an existing model


provides


greater


access


the


information.


Eliminate


any


"rider"


elements


that


tag


along


with


other


elements.


If possible,


use


only


one


element


to describe


the


information.


Avoid


recursive


content


models


simplify


automated


conversion


from


word


processing


files


SGML


format


(instance).


The specific retrieval software(


used is not important.


A high


priority when


selecting


a retrieval


system


should


what


features)


are


required


interpreting


information


in an instance.


Additional


conversion


time


required


when


a retrieval


--









Recommendations


Continue document analysis


of FCES publications,


refining


structural


properties


to decrease


the


author'


knowledge


requirement


the


process.


Update


model


publications


change,


ensuring


compatibility


between


current


and


older


FCES


publications.


Generate


reports


when


there


are


any


changes


or revi


sions


the


model


Design


new


templates


authors


the


tagging


process.


example


document


template


that


has


blanks


Develop


to fill


tools


front


matter


automatically


of a publication.


hyper link


references


FCES


publications


tables


and


graphics


Continue reviewing retrieval software as new applications


become


available.


Link


multiple


conversion,


error


software


checking


such


authoring,


validation,


and


SGML


retrieval


software


into


a seaml


ess


interface


tool


the


author


Provide


on-line


documentation


help


facility


entire


document


conversion


process


Support


a wider


variety


of word


proc


essors.


Minimize


eliminate


.much


the


manual


tagging


process


as possible.













GLOSSARY


The


ISO


8879-1986


International


Standard


gives


the


following


definitions


certain


names


titles


described


thi


research


project:


abstract


syntax


SGML) 3


Rules


that define how markup


added


the


data


document,


without


regard


the


specific


characters


used


to represent


the


markup.


application:


Text


processing


application.


attribute


an element):


A characteristic quality,


other


than


type


or content.


attribute


definition:


A member


an attribute


definition


list;


defines


attribute


name,


allowed


values,


and


default


value.


attribute


definition


list:


A set


one


or more


attribute


definitions defined by the attribute definition


list parameter


an attribute


definition


list


declaration.


base


document


element:


document


element


whose


document


type


the


base


document


type.


base


document


type:


The


document


type


specified


the


first


document


type


declaration


a prolog.


CDATA:


Character


data.


CDATA


entity:


Character


data


entity.









character:


atom


information


with


individual


meaning,


defined


a character


repertoire.


comment:


portion


markup


declaration


that


contains


explanations


or remarks


intended


to aid


persons


working


with


the


document.


concrete


syntax


SGML):


A binding of the abstract syntax


particular


delimiter


characters,


quantities,


markup


declaration


names,


etc..


conforming


SGML


application:


An SGML


application


that


requires


documents


conforming


SGML


documents,


and


whose


documentation


meets


the


requirements


this


International


Standard.


conforming


SGML


document:


An SGML


document


that


complies


with


provisions


this


International


Standard.


content:


Characters


that


occur


between


start-tag


and


end-tag


element


document


instance.


They


can


interpreted as data,


proper subelements,


included subelements,


other


markup,


or a mixture


them.


NOTE


- if


an element has


an explicit


content


reference,


or its


declared


content


"EMPTY"


, the


content


empty.


such


cases


, the application


itself


may generate data


and


process


as though


were


content


data.


(content)


model:


Parameter


an element


declaration


that









core


concrete


syntax:


A variant.


of the reference concrete


syntax


that


has


no short


reference


delimiters.


data:


The


characters


document


that


represent


the


inherent


information


content;


characters


that


are


not


recognized


as markup.


data


content:


The


portion


element'


content


that


data


rather


than


markup


or a subelement.


data


tag:


A string that


conforms


to the data


tag pattern


open


element.


serves


both


the


end-tag


the


open


element


and


as character


data


the


element


that


contains


declaration:


Markup


declaration.


declaration


subset:


delimited


portion


markup


declaration


which


other


declarations


can


occur.


NOTE


- Declaration


subsets


occur


only


document


type,


link


type,


and


marked


section


declarations.


delimiter


characters:


Character class that consists of each


SGML


character,


other


than


name


character


function


character,


that


occurs


string


assigned


delimiter


role


the


concrete


syntax.


delimiter


set:


assignments


delimiter


strings


the


abstract


syntax


delimiter


roles.


delimiter


(string):


A character string assigned to a delimiter


role


the


concrete


syntax.


descriptive


markup:


Markup


that


describes


the


structure


and


. a aj


I I


r


1









*In


particular,


uses


tags


express


the


element


structure.


document:


A collection


information


that


is processed


as a


unit.


document


classified


being


particular


document


type.


NOTE


this


international


Standard,


the


term


almost


invariably means


(without


loss


of accuracy)


an SGML document.


document


character


set:


The character set used for all markup


in an SGML


document,


initially


least)


data.


NOTE


When


a document


interchanged


between


systems,


character


translated


to the


receiving


system


character


set.


document


element:


The


element


that


outermost


element


an instance


of a document


type;


that


the


element


whose


generic


document


identifier


instance:


document


Instance


type


name.


of a document


type


document


type:


class


documents


having


similar


characteristics;


example,


journal,


arti


cle,


technical


manual,


or memo.


(document)


type


declaration:


A markup


declaration


that


contains


the


formal


specification


document


type


definition.


document


(type)


definition:


Rules,


determined


application,


that


apply


SGML


to the markup


of documents


nr v+- i n1 r 1 f -f


a A~ imn


i nTr-i ioca


~af; n; t; nn


1 fnrm~l


Ctma


t~lna









the


element


types,


element


relationships


and


attributes,


and


references


that


can


represented


markup


. It


thereby


defines


the


vocabulary


the


markup


for


which


SGML


defines


the


syntax.


NOTE


- A document


type


definition


can


also


include


comments


that


describe


the


semantics


elements


and


attributes,


and


any


application


conventions.


DTD:


Document


type


definition


element:


A component of the hierarchical structure defined by


a document


type


definition.


identified


a document


instance


descriptive


markup,


usually


start-tag


and


end-tag.


NOTE


element


classified


being


particular


element


type.


element


declaration:


markup


declaration


that


contains


the


formal


specification


the


part


element


type


definition


that


deals


with


the


content


and


markup


minimization.


element


structure:


The


organization


document


into


hierarchi


of elements,


with


each


hierarchy


conforming


different


document


type


definition


element


type:


class


elements


having


similar


characteristic


example,


paragraph,


chapter,


abstract,


footnote,


or bibliography.