Sample answers and commentary for the assignment due 14 October 2012.
Questions are shown as follows:
For the TEI vocabulary, answer the recurrent document-design questions (below). Find the answers either by consulting the spec(s) or by trying things against the DTDs.
Expected answers are shown for each question, formatted like this paragraph.
For the most part, the answers given here are based on TEI P5, the current version of the TEI vocabulary. But it is possible that in some cases my memory of TEI P3 and TEI P4 has let me to say things that are true of those versions of TEI but not of TEI P5.
The questions are these:
Sections and headings:
Does the vocabulary have markup for sections?
For section headings?
Do all sections have headings?
TEI provides generic markup for sections using
the div element, or alternatively
the elements
div1,
div2,
div3,
div4,
div5,
div6, and
div7 elements, where the numeric suffix
indicates the depth of nesting.
Section headings are uniformly marked
head; headings are optional,
as are other specialized elements for the
beginnings of sections (byline,
dateline, epigraph, etc.).
Lists, paragraphs, and notes:
Does the vocabulary have markup for lists? paragraphs? footnotes? endnotes? block notes?
Can lists occur between paragraphs?
Can lists occur within paragraphs?
Do paragraphs occur within list items?
Ditto for notes.
Lists of all kinds are marked up using
the list element; distinctions
among bulleted (or unordered) lists, numbered
(or ordered) lists, and glossary or definition
lists are made using the type attribute.
(In some vocabularies, including ISO 8879 Annex E
and HTML, the same distinctions are made by
using distinct element types.)
Paragraphs are tagged using p,
notes (both footnotes and endnotes) using
note.
Lists and notes can occur both within and between
paragraphs (so they behave in some ways like
paragraph-level elements and in some ways like
phrase-level elements). Both list items and
notes may contain p elements,
but neither is required to: when the
list item would contain a single p element,
the tags for the p element may be
omitted, and the item element serves
as the container for the paragraph. (So also for
single-paragraph notes.)
The type attribute on list
is a paradigm case of a semi-closed list: the
documentation lists the values
ordered,
bulleted,
simple, and
gloss, with the suggestion that
software be prepared to handle those types
properly. Other values can be provided, however,
when needed.
TEI does provide specialized elements for some
kinds of list (listBibl for
lists of bibliographic references,
listWit for lists of text-critical
witnesses (manuscripts and editions),
listOrg,
listEvent,
listPerson,
listPlace,
listNym for lists of
organizations, events, persons, places, and
names, respectively.
In retrospect, this user of TEI believes that
it was a mistake not to separate glossaries
out from bulleted and numbered lists. The
required content of glossaries is different,
and existing TEI software is inconsistent in whether
it uses the type attribute or the
existence of label elements among
the children of a list to trigger the special
handling needed for conventional layout of glossary
lists.
It was also a mistake to specify the core of the content
model (item+ | (label, item)+):
when labels are provided, the label and item belong
together and it would be more useful to tag them
as a single unit. The failure to provide a grouping
element for term and definition is particularly
irritating when processing glossary lists; the fact
that the error is shared by other vocabularies is
no consolation.
And while in a confessional vein, I'll say that
experience in recent years has made me prefer
a different approach to paragraphs inside list items
and notes. The idea of making single-paragraph list
items appear as <item>...</item>
instead of as <item><p>...</p></item>
seemed like a good idea at the time, but it is not
significantly more convenient for authoring purposes
(especially not with a good XML editor), and
it is significantly less convenient for processing.
Phrase-level markup:
What sort of character- or phrase-level markup is allowed? (If something is italic, can I say why it's italic? Must I?)
The TEI has a profusion of specialized phrase-level
elements intended to identify typographically distinct
phrases and to specify, at the same time, the reason
for their special typographic treatment.
Section 3.3
of the TEI Guidelines describes the elements most
obviously relevant to this question (foreign,
emph,
distinct,
mentioned,
term), but the
rest of chapter 3 defines a number of other phrase-level
elements available in the TEI's ‘core’ tag set.
Other chapters define phrase-level elements suitable for
specialized types of document or for specialized interests.
Direct discourse and quotation
Does the vocabulary have markup for run-on quotations? block quotations?
Can they be used for direct discourse in narrative?
Quotations, whether set as display quotations or run in,
are tagged using q,
quote,
and/or
said.
The element said is provided specifically
for tagging direct discourse, but direct discourse may
also be tagged using q.
Some participants in the development of the TEI vocabulary leaned toward the view that direct discourse in a novel is a form of quotation, and they wished the same element to be used both for direct discourse (viewed as quotation of the characters in a narrative) and for quotation in expository prose (actual quotation, typically of other authors.
Other participants objected to this, on the grounds that the words attributed to characters in novel were, in reality, composed by the novel's author, and so do not constitute quotation. It was important, they felt, to distinguish what was actually written by the author of the text and what was written by others, in part because quantitative studies of style or careful linguistic analysis might wish to exclude quoted material from corpus used to characterize an author. (If an author tends to relatively short sentences, a few quotations from Kant could completely distort the author's statistical profile.)
TEI P3 and P4 attempted to navigate this dispute
by providing q for use both for ‘real’
quotation and for direct discourse, and quote for
projects which wished to distinguish ‘real’
quotation. Unfortunately, it's not always possible to tell
when a quotation is real. (Writers of fiction, in particular, may
purport to quote other authors when they have in fact written
the allegedly quoted material themselves.) So the documentation
for quote allows it to be used for any
purported quotation. This makes it very hard to
understand the intended distinction between q and
quote.
In TEI P5, the analysis has been changed in a way that
some may regard as an improvement and others as a further
muddying of the waters.
A new said element is introduced specifically for
direct discourse (both in fiction and in reportage).
And the scope of q is
broadened to include anything with quotation marks around it.
This seems to make q serve the same
purpose for material in quotation marks
as hi serves for material in italics or boldface
(i.e. it has a solely typographic meaning, with no information
on the reason for the special typographic treatment).
Poetry, drama
Does the vocabulary have markup for verse? drama?
The lg and l (lower-case el) elements
for verse, and the
sp (speech),
speaker,
and
stage elements are part of the TEI core.
Additional elements for dramatic texts and verse texts are provided in chapters 6 and 7 of the Guidelines.
Textual variation
What happens if I must record different readings in different sources of the work?
The app element, defined in
chapter 12, Critical Apparatus,
is designed to hold information about individual points of
variation among the witnesses to a text. It contains
elements for the variant readings, which can be
identified as belonging to specific witnesses.
The choice element in the core tag set
can be used for simple cases of variation:
regularized or normalized spelling vs old spelling,
corrected vs uncorrected text,
abbreviation vs expanded form.
It is not intended or suitable for recording (say) the
differences the the Folio and Quarto texts of
a play by Shakespeare, or the textual variations
among different editions of a poem by Yeats.
Annotation
Does the vocabulary support arbitrary annotation of the document? Annotation of fixed types?
In addition to the generic note element,
there are provisions for a wide variety of
forms of annotation which differ in formality
and weight.
There are predefined elements for some linguistic
analysis (s,
cl,
phr,
w,
and m for sentences, clauses, phrases, words, and morphemes;
they can carry attributes with specification of their
type and linguistic function).
The interp element is provided to
allow light-weight unstructured annotation;
the fs (feature-structure) element and
a whole family of elements for its content provide
for tightly structured annotations.
It is not clear why the interp element
is needed, given that note already
exists and provides the same facility for unstructured
annotation that can be linked to arbitrary locations in the
text.
Some of those involved in developing the TEI vocabulary
wanted a separate element to distinguish the notes
present in the exemplar of an electronic text from annotations
added by others in the course of their work with an
electronic text.
This is not a particularly compelling argument,
since note has
type
and
resp attributes precisely in order to
allow notes from different annotators to be distinguished.
But in the end, including the separate element seemed
likely to make the TEI vocabulary more palatable to some
potential users. (And it could not be argued that the TEI
never included multiple element types that mean the same
or similar things. Perhaps it should not do so, but
that ship had sailed very early.)
Feature-structure markup was developed by the TEI working group on linguistic analysis, as feature structures are a commonplace in linguistic analysis. The mechanism is general enough, however, that it can be used for essentially any kind of structured information. When carried to its logical extreme, feature structure markup is an alternative both to XML itself and to database management systems. Perhaps fortunately, it is seldom carried to that extreme.
An example of non-linguistic use of TEI feature structures is the CATMA system developed by Jan Christoph Meister, a narratologist at the University of Hamburg, for annotation of narrative texts.
Hypertext
Does the vocabulary support hyperlinking? Outgoing? Incoming?
Hyperlinks are represented in TEI using the
ref and ptr elements.
(The former has textual content, the latter
does not; the expectation is that the formatting
or rendering system will supply appropriate text
such as the number or title of the section being
pointed at, or the number of the figure if it is
a figure being pointed
at, etc.)
Both incoming and outgoing links may be asserted using the
link element.
The ref and ptr elements
resemble the a element of HTML (when it
carries an href attribute): they assert
the existence of a hyperlink, one end-point of which is
located at the ref, ptr,
or a element itself. That is, one
end-point of the hyperlink whose existence is asserted
is invariably located at the element asserting the link.
The link element, by contrast, asserts
the existence of a hyperlink whose end-points are all
distinct from the link element itself.
It may thus be used to assert hyperlinks whose endpoints
are all located in other documents.
There was a time when hypertext theorists regarded the necessity for placing a link assertion at one of the link's end-points as sign of a crude and primitive hypertext system. (This is one reason hypertext enthusiasts were not originally much impressed by the World Wide Web.)
In TEI P3 and P4, ref and ptr
were restricted to internal links, while the separate elements
xref and xptr were provided
for external links. By the time TEI P5 was developed,
experience with the World Wide Web has made such a
distinction appear arbitrary and unhelpful, so the
two pairs of elements were merged.
Dates and other low-level datatypes
Does the vocabulary have markup for dates? numbers? weights and measures? times of day? URIs?
There are TEI elements for all the special items
mentioned except weights (for which use measure)
and URIS.
Metadata — inline? external?
Does the vocabulary allow the XML document to identify itself using internal metadata? External metadata?
TEI provides an inline header (called teiHeader)
with (a) a full bibliographic description of the
electronic document itself, (b) optional descriptions
of non-bibliographic properties of the electronic
document itself and the work it instantiates, and (c) a
change history to keep track of modifications to the document.
For some questions, additional comments that go beyond the expected answer are also shown, formatted like this paragraph.