Introduction to TEI and WEA Encoding Workshop

Encoding Screenplays

Joey Takeda, SFU DHIL

March 17, 2023

Encoding, markup, et cetera...

At its core, textual encoding is a way of identifying and differentiating bits of text from other bits of texts.

We do this all the time!

Italics for emphasis

Underlining for titles

Bold for extra-emphasis

Quotation marks for outside attribution or skepticism

All capitals to YELL

+++

Encoding, markup, et cetera...

But these are contextual and local

E.g. different types of punctuation for levels of quotation

And they are subject to varying interpretations

E.g. I think these quotation marks denote a term, but maybe the author is just being sarcastic...

Encoding Texts as Literary Criticism

Marking up text is an assertion of your knowledge and your interpretation of the text

What does the text (form and content) express?

The process is analytical, strategic, and interpretive.

It is analytical, in identifying a set of components into which the text can meaningfully be broken and whose relationship can be represented

Markup is strategic, in that text encoding is always aimed (deliberately or by default) at some intellectual or practical goal

And markup is interpretive, in that the act of encoding will always take place through a connection between an observing individual and a source object.

Julia Flanders, Syd Bauman, and Sarah Connell. "Text Encoding." Doing Digital Humanities, edited by Constance Crompton, Richard Lane, and Ray Siemens. Routledge, 2016.

XML

XML = eXtensible Markup Language

XML is not a set language unto itself, but a grammar

There is nothing inherent about the function of XML

It is purely a structure--a way of organizing

Anyone can conceive of an XML dialect (e.g. it is extensible)

XML is Everywhere

HTML (HyperText Markup Language: Every website)

KML (Keyhole Markup Language: Google Maps)

RDF (Resource Description Framework: Library catalogues)

SVG (Scalable Vector Graphics: Digital Images)

OOXML (Open Office XML: This presentation, word documents, et cetera)

XML Markup

Markup codifies intentions

"Sure"

<quotation>Sure</quotation>

<sarcasm>Sure</sarcasm>

<skepticism>Sure</skepticism>

<title>Sure</title>

XML

XML is hierarchical

XML is a tree-like structure

And is often described in genealogical terms

XML

Think of the hierarchy of the book:

Book

Chapters

Sections

Paragraphs

Sentences

Words

Letters

XML


 <book>
    <chapter>
        <section>
            <paragraph>
                <sentence>
                    <word>
                        <letter></letter>
                    </word>
                </sentence>
            </paragraph>
        </section>
    </chapter>
</book>

XML Explained

The two pointy brackets is called an element

E.g. <book> would be called the book element

All elements have start and end tags

E.g. <book> is the start tag and </book> is the end tag

XML Explained

Elements can also have attributes and each attribute must have a value

E.g. <book type= "primary"> has a type attribute with the value of primary

(Think of attributes as you would in everyday life; people don't have "height" or "age" without a value)

XML Explained

Elements cannot overlap

<sentence><word>Word1</word></sentence> is right

<sentence><word>Word1</sentence></word> is wrong

Elements nest and use genealogical terms

There is always a root element

The TEI

A set of guidelines for encoding text

A non-profit organization

A community or consortium of users

Website: https://tei-c.org/

The TEI is

A markup language written in XML

Currently in its 5th major revision (P5 4.5.0)

Used by many projects across the world in many different languages and for many different reasons

What the TEI is not

A language that describes how a text should be displayed online or in print: "performative and expressive significance of the input" vs "the aesthetics of the output".

A programming language: encoding your texts in TEI does not automatically do anything to them

Caveat: There are many, many tools for transforming TEI into other formats (Word documents, PDFs, and, of course, websites)

Components of a (basic) TEI file

Root <TEI> element

A <teiHeader> that describes both the file and the primary source that you are transcribing (if applicable)

A <text> that contains the text of the document

Within text, you can have a <front>, <body>, or <back>

TEI

                
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
      <fileDesc>
         <titleStmt>
            <title>Title</title>
         </titleStmt>
         <publicationStmt>
            <p>Publication Information</p>
         </publicationStmt>
         <sourceDesc>
            <p>Information about the source</p>
         </sourceDesc>
      </fileDesc>
  </teiHeader>
  <text>
      <body>
         <p>Some text here.</p>
      </body>
  </text>
</TEI>

Practice: Encoding your Biographical Information

Download:
https://sfu.ca/~takeda/teiworkshop/2023-03-17/wea_bio.zip

Moving to screenshare....

oXygen Tips

To tag element quickly, highlight + Command + E and type in element name

Always make sure the file is valid! Validate, validate, validate! (Red checkmark OR Command + Shift + V)

The TEI

Offers a rich vocabulary and method to encode:

Bibliographic and structural features: page breaks, headers, footers, page numbers, line breaks, divisions, paragraphs, line groups, etc

Interpretative features: stage movement, emphasis, place names, proper names, dialogue direction, etc

Editorial apparatus: hands, witnesses, collation, gaps, additions, deletions, etc

Linguistic features: morphemes, feature structures, orthographic form, etc

Spoken features: incidents, pauses, shifts, "communicative phenomenon", etc

Metadata: various classification schemes, provenance, manuscript description, etc

+++++

TEI

Note that the TEI is huge (569 elements)

No one uses the entirety of the TEI tagset

Individual projects customize the TEI for their own needs, usually using a small subset of the overall tagset

E.g. Drama projects will use the drama tagset (<sp> for speech, <speaker> for speaker, et cetera) and discard the linguistic/dictionary tagset (<entry> for dictionary entries, <m> for morpheme, etc).

Schema

The TEI is one big schema: a set of rules about how things are structured

TEI projects usually customize their schema to use only a subset

The WEA uses 151 elements (and, in reality, probably way fewer)

And soon, we'll use more!

Screenplays

                            
<pb n="1"/>
<byline>
   Treatment and dialogue by:
   <name ref="pers:WE1">Winnifred Reeve</name>
</byline>
<noteGrp type="annotation"
place="top right">
   <note>From P.1 to 38 this all new original
   material by Reeve</note>
   <note>Also last sequences practically 
   original except for block incident</note>
</noteGrp>
<head>Ropes</head>
<opener>
   <byline>By
      <lb/><name>Wilbur Daniel
      Steele</name>
   </byline>
</opener>

                
 <p><wea:slugline>Look-out station.</wea:slugline> Paul is 
 smiling as he looks through glasses. The other boy, 
 also with glasses, is looking through them at the water.</p>
   <sp>
      <speaker>Life Guard</speaker>
      <p>The water's pretty rough.</p>
   </sp>
   <sp>
      <speaker>Paul</speaker>
      <p>Look at that girl.</p>
   </sp>
   <sp>
      <speaker>Life Guard</speaker>
      <p>And there's a powerful undertow off there.</p>
   </sp>

The Guidelines: Some Examples

Encoding

Setting Up

Download encoding package from
https://jenkins.hcmc.uvic.ca/job/WEA/lastSuccessfulBuild/artifact/products/site/contribute.html

Open in oXygen

Setting Up

Change purple lines at the top from "../sch/wea.rng" to "https://www.sfu.ca/~takeda/teiworkshop/2023-03-17/screenplay.rng"


<?xml-model href="../sch/wea.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?><?xml-model href="../sch/wea.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>


                    
                    <?xml-model href="https://www.sfu.ca/~takeda/teiworkshop/2023-03-17/screenplay.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?><?xml-model href="https://www.sfu.ca/~takeda/teiworkshop/2023-03-17/screenplay.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>

Basic Components

Paragraphs with <p>

Speeches tagged with <sp>; speaker tagged with <speaker>; text tagged with <p>

Headings tagged with <head>

Page beginnings: <pb/>

Page numbers: <fw type="pageNum">

Tips

If you notice something, look through the TEI Guidelines for something that might work

Use Ropes as an example (but note that the encoding is still experimental)

When in doubt, ask!

Links

Presentation:
https://sfu.ca/~takeda/teiworkshop/2023-03-17/

Ropes:
https://sfu.ca/~takeda/teiworkshop/2023-03-17/Ropes.xml

TEI Guidelines:
https://tei-c.org