hope and horrorreal -life TEI the CMS/TEI/XSL/HTML stack using TEI with Sobek
executive summaryTEI is the Right Way to transcribe books/ texts T EI is data model only, does not provide logic or view T EI alone is not legible; it needs processing (XSL) SobekCM can host TEI today, but not well: default XSL stylesheets would help should allow styled TEI to be a principal view for an item, not just metadata
real -life projectsixteenth century Latin histories of Florida
What do we have? Books: sixteenth century histories of Florida written in Latin Whom can we help? NeoLatin scholars Historians of Florida Latin teachers and students Whats the issue? Theyre imagebased PDFs Only we have them No translations or commentaries Solutions? Transcribe to text based format Collect and publish in a CMS Add translations and commentary What technology can help?
What is TEI? (in bullet points)Standard conventions for transcribing books/textsin a text based searchable formatthat computers understand* and manipulate *that any* web browser can display*with more* descriptive detail than plain HTMLdesigned for multiple text streams* like notes*herein lies previously undisclosed horror
Structured General Markup Language old HTML XML HTML 5 TEI iTunes Library RSS more texty more databasy for our simple purposes, TEIis a kind of XML old TEI XHTML 80s 90s 00s 10s
Exemplar Caesarei PrivilegiiTEI looks like HTML but with different /more tags easy to learn
Rudolphus II. Divina favente clementia electus Romanorum imperator semper Augustus Germaniae Hungariae, Bohemiae, Dalmatiae, Croatiae Sclavoniae, etc. Rex, Archidux Austriae blah blah blah
< pb facs= "./images/DeBry1591_Page_006.jpg" />
< persName ref= http://thesaurus.cerl.org/record/cnp01467224"> < foreName>possibly many more tags discuss editorial standards before you begin this isnt even OCD
What can we do with TEI?Preservation transcribe the text for future use (practically) illegible in the present Presentation publish a text for real people to read Digital Tools crossreference multiple text streams more texty more databasy1 2 3
TEIfor preservationo r, first steps with Sobek
Sobeks strengths are PDFs and graphics Sobek normally deals with text as PDF, not as text nonPDF formats get second class citizenship
PDF JPG TXT a ctual size t humbnail of PDF ? 273 bytes Content length: zero
Confusing to researcher?
Why?Author/editor didn t supply text enabled PDF Author/editor contaminated pairtree object No system can eliminate user error Promoting PDF reduces visibility of user error but this makes TEI hard to find
Primary item view for Tommy Tiptop Wheres the text ?
Page Turner View (presentation via graphics, not text no TEIinvolved)
Wheres the text? Under metadataif not used for presentation, TEI should be a second class citizen
Tommy Tiptop: TEI for preservation only, if even that presentation was handled by graphics: TEI adds no value here but it is a start this is what a TEI project needs to do first
What would help?Sobek excels at PDFs and graphics: make it treat text based files as equally well? TEI needs to be more than plain data (XML) view: there needs to be presentation quality if researchers are actually going to read it Is Sobek maybe not the best fit for a TEI project?
TEIfor presentation and scholarly toolsor, the next levels are not bleeding edge
TEI for presentation (CHLT, mid 2000s) main text stream notes streamat least its not raw XML
TEI in a complex scholarly tool (Perseus, c. 1999) main text stream apparatus criticus translation (+notes) keyed lexicalegible presentation and useful tool YAY
TEI as data model for a scholarly tool (how things work today) main text stream textual variants notes and commentary
creeping horror how do we go from illegible TEI to a legible page?
Document Type: HTML TEI data model:test
testwithout more info, browser thinks:youd like that test displayed in italics your moonspeak means nothing to meview:test testsolution:use XSL to translate TEI into HTML ( into or CSS)
HTML document TEI document XML document processor XSL processor HTML document processor displaymodel understanding document structure understanding specific tag meaningTEI documents on web need to be translated to HTMLviewweb browser layout engine+XSL TEI TEI HTML XSL
What is XSL?translates XML (to HTML, PDF, Word, LaTeX XML) XSL 1.0 is supported by every browser TEI is trivial XSL/XPath/namespaces is harder
< xsl: template match= "/" name= htmlShell priority= "99" > < xsl: call template name= htmlHead /> < xsl: if test= "$ includeToolbox = true()" > < xsl: call template name= teibpToolbox /> xsl:if >
< xsl: apply templates />< xsl: copy of select= "$ htmlFooter /> xsl:template >XSL is a programming language written in XML you mix HTML/output (black) with XSL commands (colors)
So, can I use someone elses XSL stylesheets?TEI Consortium publishes some on Github every web browser understands XSL 1.0 no browser understands XSL 2.0 ergo, TEI Consortiums XSL stylesheets are 2.0 Indianas Boilerplate is XSL 1.0 but simplistic: improves Tommy Tiptoe, doesnt handle translations, notes, etc.
no XSL calling Boilerplate XSL
Tommy Tiptop on Boilerplate (at least its legible)
copyright page from de Brys 1591 history of Florida click on pic for full page image legible text
Lesson: g oing from model to view takes XSL preservation/transcription? presentation of basic text? yes we can were close addition of translation/other streams? what our project can do now: next section need to create simple XSL stylesheet best bet: hack down Boilerplate t his could be simpler with default stylesheet
what would help nowa default XSL stylesheet on SobekCM would free authors/editors from having to write XSL would provide basic, consistent look would keep processing client side (in browser) but does that violate pairtree object encapsulation?
going from presentation to toolsthat latinists actually use
classicists tools use lots of text streamsLoeb (Harvard) editions: text, translation, textual notes, translators notes Teubner editions: text, apparatus fontium, apparatus criticus asynchronous streams: TOC, introduction, sigla, commentary, index, index nominum, index locorum
parallel Greek text parallel English translation translators notes textual notes st andard a ccessibility for c lassicists: one opening fo ur s ynchronous d ata s treams notes keyed to words, parallel texts to each other
Latin p oem prose summary apparatus criticus editors notes plus endnotes much later old style: in usum Delphini four synchronous streams, one asynchronous ancillary streams keyed to certain words or lines
Rudolphus II. Divina favente clementia electus Romanorum Imperator, semper Augustus, Germaniae Hungariae Bohemiae, Dalmatiae, Croatiae, Sclavoniae etc. Rex, Archidux Austriae, blah blah blah blah blah
Rudolph II, elected Emperor of the Romans with Divine Clemency assenting, forever Augustus, king of Germany, Hungary, Bohemia, Dalmatia, Croatia, Slavonia, etc., Archduke of Austria, blah blah blah blah blah< linkGrp type =" translation "> text translation standoff link table (TEI convention) (=poor mans relational database) syncing text and translation
should it be done?SobekCM organizes and catalogues better to keep text and translation separate ? pr esentable texts should go in SobekCM should complex tools go there too? maybe not
staffingas things stand with TEIand SobekCM :TEIauthors/editors alone cant produce presentable materials or scholarly tools, just data need skilled XSL(and CSS/JS) coder on project staff to make anything legible, let alone shiny/useful but: default XSL(+CSS/JS?) stylesheets on SobekCM would change the game for simple projects
Default XSL?Develop a documented default TEIXSL stylesheet (or core of stylesheets) to cover common use cases? Respect calls to bespoke XSL within same pairtree object so that authors can develop complex TEI. Inject (serverside) a default stylesheet call if none exists, so that Tommy Tiptoe never happens ? Consider doing this in presentation logic to preserve pairtree object encapsulation?
TEI: what class citizen?Allow TEI to be the principal view for an item only if it adds value (presentation/tools) Default TEI to be ancillary metadata if it adds no value (preservation onlyTimmy). Do not index XSL, CSS, or JS called from TEI files: they shouldnt be treated as catalogued items.
Preservation onlyMetadata and structural markup Not necessarily legible Tommy TiptopTEIsufficesPresentation for Skilled LatinistsLegible text(for those who read Latin)needs basic XSL/CSSBoilerplate or some vanilla XSLwould sufficePublic AccessibilityandScholarly ToolsAncillary material: apparatus, translation, commentary, indices, popover glosses, etc. needs moarTEIand advanced XSL,CSSandJSneeds new host