Joey Takeda, SFU DHIL
November 18, 2021
Introduce text encoding, the history of text encoding, and digital editions
Discuss the ideological differences between databases and structured text encoding
Hands-on practice in encoding using the Lyon in Mourning
All presentation materials:
https://sfu.ca/~takeda/teiworkshop
Let's look at an image
Write down a few things about this document; whatever you find interesting
This could be bibliographical, textual, interpretive, et cetera
How to represent data--flat versus hierarchical?
Document-centric vs data-centric
(Ahnert et al 51)[...] abstract our objects of study into data points that can be entered into a database or spreadsheet.
(Ahnert et al 51)This does not imply some shared property intrinsic to each of the subjects under study, rather it implies the widespread utility of networks as a lens through which to view many aspects of our shared world
Takes as its central object the form, content, and structure of the text
Attends to contextual information, hierarchy, and structure (does not attempt to flatten)
Let's mark up an image
Using the list from above, use the Drawing tool here to draw square boxes around the parts you found interesting
Add tags to identity these components
What do you notice about the boxes? Nesting? Overlapping?
At its core, textual encoding is a way of identifying and differentiating bits of text from other bits of texts.
We do this all the time!
Italics for emphasis
Underlining for titles
Bold for extra-emphasis
Quotation marks for outside attribution
or skepticism
All capitals to YELL
+++
But these are contextual and local
E.g. different types of punctuation for levels of quotation
And they are subject to varying interpretations
E.g. I think these quotation marks denote a term, but maybe the author is just being sarcastic...
Accessibility
Distribution
Flexibility
Interoperability
Convertibility (i.e. from one format to another)
Analysis (Distant Reading, et cetera)
Answering existing (and asking new) research questions
No essential connection exists between the fields of book history and digital humanities, though they share certain similarities
Historical relationship between book history, bibliography, text encoding, and markup languages
In one sense, bibliography, print history, textual studies, serve as the foundation for the digital humanities (and, arguably, the web in its entirety)
XML is hierarchical
XML is a tree-like structure
And is often described in genealogical terms
XML = eXtensible Markup Language
XML is not a set language unto itself, but a grammar
There is nothing inherent about the function of XML
It is purely a structure--a way of organizing
Anyone can conceive of an XML dialect (e.g. it is extensible)
HTML (HyperText Markup Language: Every website)
KML (Keyhole Markup Language: Google Maps)
RDF (Resource Description Framework: Library catalogues)
SVG (Scalable Vector Graphics: Digital Images)
OOXML (Open Office XML: This presentation, word documents, et cetera)
Think of the hierarchy of the book:
Book
Chapters
Sections
Paragraphs
Sentences
Words
Letters
<book>
<chapter>
<section>
<paragraph>
<sentence>
<word>
<letter></letter>
</word>
</sentence>
</paragraph>
</section>
</chapter>
</book>
The two pointy brackets is called an element
E.g. <book> would be called the book element
All elements have start and end tags
E.g. <book> is the start tag and </book> is the end tag
Elements can also have attributes and each attribute must have a value
E.g. <book type= "primary"> has a type attribute with the value of primary
(Think of attributes as you would in everyday life; people don't have "height" or "age" without a value)
Elements cannot overlap
<sentence><word>Word1</word></sentence> is right
<sentence><word>Word1</sentence></word> is wrong
Elements nest and use genealogical terms
There is always a root element
Tonra remarks on the "absence" of "[d]igital scholarly editing and editions" in 18th century studies and separately comments on the popularity of databases
Is a markup language written in XML
Currently in its 5th major revision (P5 4.1.0)
Used by many projects across the world in many different languages and for many different reasons
Within the noisy market place of the Digital Humanities, the TEI is a kind of senior member, an annoying parental figure for some, a benevolent one for others, something just too old-fashioned even to be considered for others.
Yet, over the last decade, it has become increasingly clear that the TEI is part of what makes the digital humanities happen.
Offers a rich vocabulary and method to encode:
Bibliographic and structural features: page breaks, headers, footers, page numbers, line breaks, divisions, paragraphs, line groups, etc
Interpretative features: stage movement, emphasis, place names, proper names, dialogue direction, etc
Editorial apparatus: hands, witnesses, collation, gaps, additions, deletions, etc
Linguistic features: morphemes, feature structures, orthographic form, etc
Spoken features: incidents, pauses, shifts, "communicative phenomenon", etc
Metadata: various classification schemes, provenance, manuscript description, etc
+++++
Note that the TEI is huge (569 elements)
No one uses the entirety of the TEI tagset
Individual projects customize the TEI for their own needs, usually using a small subset of the overall tagset
E.g. Drama projects will use the drama tagset (<sp> for speech, <speaker> for speaker, et cetera) and discard the linguistic/dictionary tagset (<entry> for dictionary entries, <m> for morpheme, etc).
A language that describes how a text should be displayed online or in print: "performative and expressive significance of the input" vs "the aesthetics of the output".
A programming language: encoding your texts in TEI does not automatically do anything to them
Caveat: There are many, many tools for transforming TEI into other formats (Word documents, PDFs, and, of course, websites)
Root <TEI> element
A <teiHeader> that describes both the file and the primary source that you are transcribing (if applicable)
A <text> that contains the text of the document
Within text, you can have a <front>, <body>, or <back>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Title</title>
</titleStmt>
<publicationStmt>
<p>Publication Information</p>
</publicationStmt>
<sourceDesc>
<p>Information about the source</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<p>Some text here.</p>
</body>
</text>
</TEI>
Go to the editor (https://sfu.ca/~takeda/teiworkshop/20211118/editor.html)
Things to tag: