Hope and Avoiding Horror: real-life TEI the CMS/TEI/XSL/HTML stack using TEI with Sobek

MISSING IMAGE

Material Information

Title:
Hope and Avoiding Horror: real-life TEI the CMS/TEI/XSL/HTML stack using TEI with Sobek
Physical Description:
Presentation slides
Language:
English
Creator:
Krause, Miller
Publisher:
George A. Smathers Libraries, University of Florida
Place of Publication:
Gainesville, FL
Publication Date:

Subjects

Subjects / Keywords:
Digital Humanities
Digital Humanities Library Group
Digital Scholarship
SobekCM

Notes

Abstract:
Case Study presentation TEI Encoding with the Latin histories of Florida Miller Krause, Classics (June 24, 2014 in Library West 419)

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright by Miller Krause. Permission granted to University of Florida to digitize and display this item for research and educational uses. Permission to reuse, publish or reproduce this item for purposes other than what is allowed by fair use or other copyright exemptions must be obtained from the copyright holder.
System ID:
AA00024211:00001


This item is only available as the following downloads:


Full Text

PAGE 1

hope and horrorreal -life TEI the CMS/TEI/XSL/HTML stack using TEI with Sobek

PAGE 2

executive summaryTEI is the Right Way to transcribe books/ texts T EI is data model only, does not provide logic or view T EI alone is not legible; it needs processing (XSL) SobekCM can host TEI today, but not well: default XSL stylesheets would help should allow styled TEI to be a principal view for an item, not just metadata

PAGE 3

real -life projectsixteenth century Latin histories of Florida

PAGE 4

What do we have? Books: sixteenth century histories of Florida written in Latin Whom can we help? NeoLatin scholars Historians of Florida Latin teachers and students Whats the issue? Theyre imagebased PDFs Only we have them No translations or commentaries Solutions? Transcribe to text based format Collect and publish in a CMS Add translations and commentary What technology can help?

PAGE 5

exuberant hope

PAGE 6

What is TEI? (in bullet points)Standard conventions for transcribing books/textsin a text based searchable formatthat computers understand* and manipulate *that any* web browser can display*with more* descriptive detail than plain HTMLdesigned for multiple text streams* like notes*herein lies previously undisclosed horror

PAGE 7

Structured General Markup Language old HTML XML HTML 5 TEI iTunes Library RSS more texty more databasy for our simple purposes, TEIis a kind of XML old TEI XHTML 80s 90s 00s 10s

PAGE 8

Exemplar Caesarei Privilegii

Rudolphus II. Divina favente clementia electus Romanorum imperator semper Augustus Germaniae Hungariae, Bohemiae, Dalmatiae, Croatiae Sclavoniae, etc. Rex, Archidux Austriae blah blah blah

TEI looks like HTML but with different /more tags easy to learn

PAGE 9

< pb facs= "./images/DeBry1591_Page_006.jpg" />
< persName ref= http://thesaurus.cerl.org/record/cnp01467224"> < foreName> R udolphus < genName> II. < addName type= "office"> Divina favente clementia electus < orgName> Romanorum Imperator , < addName type= "nobility">semper Augustus, < addName type= "nobility"> < placeName>Germaniae < placeName>Hungariae < placeName>Bohemiae < placeName>Dalmatiae < placeName>Croatiae < placeName>Sclavoniae < abbr> etc. Rex < addName type= "nobility"> Archidux < placeName>Austriae blah blah blah

possibly many more tags discuss editorial standards before you begin this isnt even OCD

PAGE 10

What can we do with TEI?Preservation transcribe the text for future use (practically) illegible in the present Presentation publish a text for real people to read Digital Tools crossreference multiple text streams more texty more databasy1 2 3

PAGE 11

TEIfor preservationo r, first steps with Sobek

PAGE 12

Sobeks strengths are PDFs and graphics Sobek normally deals with text as PDF, not as text nonPDF formats get second class citizenship

PAGE 13

PDF JPG TXT a ctual size t humbnail of PDF ? 273 bytes Content length: zero

PAGE 14

Confusing to researcher?

PAGE 15

Why?Author/editor didn t supply text enabled PDF Author/editor contaminated pairtree object No system can eliminate user error Promoting PDF reduces visibility of user error but this makes TEI hard to find

PAGE 16

Primary item view for Tommy Tiptop Wheres the text ?

PAGE 17

Page Turner View (presentation via graphics, not text no TEIinvolved)

PAGE 18

Wheres the text? Under metadataif not used for presentation, TEI should be a second class citizen

PAGE 19

Tommy Tiptop: TEI for preservation only, if even that presentation was handled by graphics: TEI adds no value here but it is a start this is what a TEI project needs to do first

PAGE 20

What would help?Sobek excels at PDFs and graphics: make it treat text based files as equally well? TEI needs to be more than plain data (XML) view: there needs to be presentation quality if researchers are actually going to read it Is Sobek maybe not the best fit for a TEI project?

PAGE 21

TEIfor presentation and scholarly toolsor, the next levels are not bleeding edge

PAGE 22

TEI for presentation (CHLT, mid 2000s) main text stream notes streamat least its not raw XML

PAGE 23

TEI in a complex scholarly tool (Perseus, c. 1999) main text stream apparatus criticus translation (+notes) keyed lexicalegible presentation and useful tool YAY

PAGE 24

TEI as data model for a scholarly tool (how things work today) main text stream textual variants notes and commentary

PAGE 25

creeping horror how do we go from illegible TEI to a legible page?

PAGE 26

Document Type: HTML TEI data model:testtestwithout more info, browser thinks:youd like that test displayed in italics your moonspeak means nothing to meview:test testsolution:use XSL to translate TEI into HTML ( into or CSS)

PAGE 27

HTML document TEI document XML document processor XSL processor HTML document processor displaymodel understanding document structure understanding specific tag meaningTEI documents on web need to be translated to HTMLviewweb browser layout engine+XSL TEI TEI HTML XSL

PAGE 28

What is XSL?translates XML (to HTML, PDF, Word, LaTeX XML) XSL 1.0 is supported by every browser TEI is trivial XSL/XPath/namespaces is harder

PAGE 29

< xsl: template match= "/" name= htmlShell priority= "99" > < xsl: call template name= htmlHead /> < xsl: if test= "$ includeToolbox = true()" > < xsl: call template name= teibpToolbox />
< xsl: apply templates />
< xsl: copy of select= "$ htmlFooter /> XSL is a programming language written in XML you mix HTML/output (black) with XSL commands (colors)

PAGE 30

So, can I use someone elses XSL stylesheets?TEI Consortium publishes some on Github every web browser understands XSL 1.0 no browser understands XSL 2.0 ergo, TEI Consortiums XSL stylesheets are 2.0 Indianas Boilerplate is XSL 1.0 but simplistic: improves Tommy Tiptoe, doesnt handle translations, notes, etc.

PAGE 31

no XSL calling Boilerplate XSL

PAGE 32

Tommy Tiptop on Boilerplate (at least its legible)

PAGE 33

copyright page from de Brys 1591 history of Florida click on pic for full page image legible text

PAGE 34

Lesson: g oing from model to view takes XSL preservation/transcription? presentation of basic text? yes we can were close addition of translation/other streams? what our project can do now: next section need to create simple XSL stylesheet best bet: hack down Boilerplate t his could be simpler with default stylesheet

PAGE 35

what would help nowa default XSL stylesheet on SobekCM would free authors/editors from having to write XSL would provide basic, consistent look would keep processing client side (in browser) but does that violate pairtree object encapsulation?

PAGE 36

going from presentation to toolsthat latinists actually use

PAGE 37

classicists tools use lots of text streamsLoeb (Harvard) editions: text, translation, textual notes, translators notes Teubner editions: text, apparatus fontium, apparatus criticus asynchronous streams: TOC, introduction, sigla, commentary, index, index nominum, index locorum

PAGE 38

parallel Greek text parallel English translation translators notes textual notes st andard a ccessibility for c lassicists: one opening fo ur s ynchronous d ata s treams notes keyed to words, parallel texts to each other

PAGE 39

Latin p oem prose summary apparatus criticus editors notes plus endnotes much later old style: in usum Delphini four synchronous streams, one asynchronous ancillary streams keyed to certain words or lines

PAGE 40

Rudolphus II. Divina favente clementia electus Romanorum Imperator, semper Augustus, Germaniae Hungariae Bohemiae, Dalmatiae, Croatiae, Sclavoniae etc. Rex, Archidux Austriae, blah blah blah blah blah

Rudolph II, elected Emperor of the Romans with Divine Clemency assenting, forever Augustus, king of Germany, Hungary, Bohemia, Dalmatia, Croatia, Slavonia, etc., Archduke of Austria, blah blah blah blah blah

< linkGrp type =" translation "> text translation standoff link table (TEI convention) (=poor mans relational database) syncing text and translation

PAGE 41

can it be done?use XSL conditionals: iterate over linkGrp when IDs are present use JavaScript: put main text in html body, hide the rest in invisible divs or iframes or script have onLoad () format things doesnt matter! conventions are for the data model (TEI) view logic can be kludged as hard as needed

PAGE 42

should it be done?SobekCM organizes and catalogues better to keep text and translation separate ? pr esentable texts should go in SobekCM should complex tools go there too? maybe not

PAGE 43

recap

PAGE 44

TEIis one part of a stack Our Intentions Reuse by Others content management SobekCM Sobek CM raw data model TEI TEI presentation logic XSL, CSS someone elses XML parser manipulation logic JavaScript someone elses program end use HTML (browser) someone elses project

PAGE 45

staffingas things stand with TEIand SobekCM :TEIauthors/editors alone cant produce presentable materials or scholarly tools, just data need skilled XSL(and CSS/JS) coder on project staff to make anything legible, let alone shiny/useful but: default XSL(+CSS/JS?) stylesheets on SobekCM would change the game for simple projects

PAGE 46

Default XSL?Develop a documented default TEIXSL stylesheet (or core of stylesheets) to cover common use cases? Respect calls to bespoke XSL within same pairtree object so that authors can develop complex TEI. Inject (serverside) a default stylesheet call if none exists, so that Tommy Tiptoe never happens ? Consider doing this in presentation logic to preserve pairtree object encapsulation?

PAGE 47

TEI: what class citizen?Allow TEI to be the principal view for an item only if it adds value (presentation/tools) Default TEI to be ancillary metadata if it adds no value (preservation onlyTimmy). Do not index XSL, CSS, or JS called from TEI files: they shouldnt be treated as catalogued items.

PAGE 48

Preservation onlyMetadata and structural markup Not necessarily legible Tommy TiptopTEIsufficesPresentation for Skilled LatinistsLegible text(for those who read Latin)needs basic XSL/CSSBoilerplate or some vanilla XSLwould sufficePublic AccessibilityandScholarly ToolsAncillary material: apparatus, translation, commentary, indices, popover glosses, etc. needs moarTEIand advanced XSL,CSSandJSneeds new host