Document modeling
Introduction to DTDs
C. M. Sperberg-McQueen, Black Mesa Technologies LLC
Rev. 18 September 2012
Overview
- Organizational notes
- Vocabulary tour (?)
- Introduction
- DTD syntax and semantics
Organizational notes
Bureaucracy, paperwork, and so on ...
Discussion logs
The syllabus says your work will include:
- (once or twice during the semester) preparation
of a summary of the discussion in a class session
Sign up!
Any volunteer for today?
Vocabulary introductions
The syllabus says your work will include:
-
(once during the semester) a class presentation on an important XML
vocablary; you will be responsible for doing the necessary
preparation, briefing the class on the origin and goals of the
vocabulary, and providing a page with links to the defining documents
for the vocabulary, the schema(s) for the vocabulary, and available
documentation. If you are particularly energetic, you may also prepare
to show the class how the vocabulary addresses the various concrete
document-modeling and design questions we will ask about each
vocabulary we look at.
Sign up!
Any volunteer for richtext / RFC 1341 (next week)?
Any requests for additional vocabularies?
Questions from last week?
Anything we need to clear up before proceeding?
Tour of vocabularies, cont'd
Introduction
Why does this come up?
Review
Programs map input to output.
What happens when the input is garbage?
Can we define garbage?
Can we define what non-garbage input is?
You are here
Overview
Representation of individual documents
- introduction to SGML and XML
- syntax (angle brackets)
- model (trees)
- → DTDs
- historical survey
The escape to the meta-level
We started out talking about representing documents.
Q. Why are we now starting to talk about
defining classes of documents?
A. Because SGML was developed by
pluralists: instead of prescribing a vocabulary for use by
everyone, they defined a way for everyone to define their
own vocabulary.
Q. How do you go about that?
Q. How do you define a vocabulary?
Q. How do you define a vocabulary
for people to use to define vocabularies?
The challenge of GIGO
Programs map input to output.
What happens when the input is garbage?
Can we define garbage?
Can we define what non-garbage input is?
Milestones in computing history
1960:
Report on the Algorithmic Language ALGOL
uses a formal specification of Algol grammar.
- succinct
- clear*
- formal*
- effective, automatable
Milestones in computing history, II
1986:
ISO 8879: Structured generalized markup language
provides for formal specification of document grammars.
- succinct
- clear*
- formal*
- effective, automatable
Basics of DTDs
In which we roll our sleeves up.
DTDs in XML
- common syntax patterns
- constructs (syntax and semantics)
- document type declarations
- element declarations
- attribute list declarations
- general entity declarations
- notation declarations
- parameter entity declarations
Common syntax patterns
All markup declarations take the form
<! (markup-declaration open delimiter,
mdo)
keyword (to indicate type of declaration)
parameters
> (markup-declaration close delimiter, mdc)
<!keyword parameter parameter parameter ... >
Document type declarations
<!DOCTYPE
name
external identifier
optional ‘internal subset’
>
Examples (in Oxygen)
Element declarations
<!ELEMENT
name
content model or keyword (ANY, EMPTY)
>
Expressions
Content models are expressions in a simple language.
An
expr is*:
- element name
- ( + expr + zero or more (comma + expr) + )
- ( + expr + zero or more (or-bar + expr) + )
- expr plus optional ?, *, or +
Examples (in Oxygen)
Attribute-list declarations
<!ATTLIST
element-name
one or more attribute declarations:
- attribute name
- attribute type
- default value or keyword
>
Examples (in Oxygen)
Attribute types
- CDATA
- ID
- IDREF, IDREFS
- ENTITY, ENTITIES
- NMTOKEN, NMTOKENS
- enumerated type
Examples (in Oxygen)
General entity declarations
<!ENTITY
name
one of
- replacement string
- external identifier and optional keyword
>
Examples (in Oxygen)
Notation declarations
<!NOTATION
name
external identifier
>
Examples (in Oxygen)
Parameter entity declarations
<!ENTITY
%
name
external identifier
optional ‘internal subset’
>
Examples (in Oxygen)
Examples
- Noah and the flood (pre-flood, post-flood)
- Ordering a pizza
- A four-course meal
- ...
Play stump the chump (time permitting)?
SGML DTDs
Some constructs you'll see.
What got left out of XML DTDs
Several SGML DTD constructs
support features omitted from XML.
- SHORTREF (<!SHORTREF foo "bar" baz>,
<!USEMAP foo barracuda>)
- RANK
- and-connector (&)
- case folding of element, attribute names
What got trimmed out of XML DTDs
Several SGML DTD constructs were trimmed down in XML.
- default entities (entity name #DEFAULT)
- tag omissibility
(<!ELEMENT list-item - O (#PCDATA | %phrase;)*>
- attribute types (NUMBER, NUMBERS, NUTOKEN, NUTOKENS)
- inclusion and exclusion exceptions
Assignments
Due: Sunday morning 23 September 2012.
1 Do an HTML 2.0 version of your document.
(We'll need to make sure everyone knows where the DTD is.)
2 Do a version of your document using an XML
version (to be supplied) of the DTD in ISO 8879 Annex E.
3 Propose a topic for a term paper or major project,
to be presented to the group later in the term.
(Goal: have these finalized by end of 2 October.)