Document modeling

Introduction to DTDs

C. M. Sperberg-McQueen, Black Mesa Technologies LLC

Rev. 18 September 2012

Nearby documents


Overview

Organizational notes

Bureaucracy, paperwork, and so on ...

Discussion logs

The syllabus says your work will include:
  • (once or twice during the semester) preparation of a summary of the discussion in a class session
Sign up!
Any volunteer for today?

Vocabulary introductions

The syllabus says your work will include:
  • (once during the semester) a class presentation on an important XML vocablary; you will be responsible for doing the necessary preparation, briefing the class on the origin and goals of the vocabulary, and providing a page with links to the defining documents for the vocabulary, the schema(s) for the vocabulary, and available documentation. If you are particularly energetic, you may also prepare to show the class how the vocabulary addresses the various concrete document-modeling and design questions we will ask about each vocabulary we look at.
Sign up!
Any volunteer for richtext / RFC 1341 (next week)?
Any requests for additional vocabularies?

Questions from last week?

Anything we need to clear up before proceeding?

Tour of vocabularies, cont'd

Introduction

Why does this come up?

Review

Programs map input to output.
What happens when the input is garbage?

Can we define garbage?

Can we define what non-garbage input is?

You are here

Overview
Representation of individual documents
  • introduction to SGML and XML
    • syntax (angle brackets)
    • model (trees)
    • → DTDs
  • historical survey
Schema languages
  • DTDs
  • XSD
  • Relax NG

The escape to the meta-level

We started out talking about representing documents.
Q. Why are we now starting to talk about defining classes of documents?

A. Because SGML was developed by pluralists: instead of prescribing a vocabulary for use by everyone, they defined a way for everyone to define their own vocabulary.

Q. How do you go about that?

Q. How do you define a vocabulary?

Q. How do you define a vocabulary for people to use to define vocabularies?

The challenge of GIGO

Programs map input to output.
What happens when the input is garbage?

Can we define garbage?

Can we define what non-garbage input is?

Milestones in computing history

1960: Report on the Algorithmic Language ALGOL uses a formal specification of Algol grammar.
  • succinct
  • clear*
  • formal*
  • effective, automatable

Milestones in computing history, II

1986: ISO 8879: Structured generalized markup language provides for formal specification of document grammars.
  • succinct
  • clear*
  • formal*
  • effective, automatable

Basics of DTDs

In which we roll our sleeves up.

DTDs in XML

Common syntax patterns

All markup declarations take the form
<! (markup-declaration open delimiter, mdo)
keyword (to indicate type of declaration)
parameters
> (markup-declaration close delimiter, mdc)
<!keyword parameter parameter parameter ... >

Document type declarations

<!DOCTYPE
name
external identifier
optional ‘internal subset’
>
Examples (in Oxygen)

Element declarations

<!ELEMENT
name
content model or keyword (ANY, EMPTY)
>

Expressions

Content models are expressions in a simple language. An expr is*:
  • element name
  • ( + expr + zero or more (comma + expr) + )
  • ( + expr + zero or more (or-bar + expr) + )
  • expr plus optional ?, *, or +
Examples (in Oxygen)

Attribute-list declarations

<!ATTLIST
element-name
one or more attribute declarations:
  • attribute name
  • attribute type
  • default value or keyword
>
Examples (in Oxygen)

Attribute types

  • CDATA
  • ID
  • IDREF, IDREFS
  • ENTITY, ENTITIES
  • NMTOKEN, NMTOKENS
  • enumerated type
Examples (in Oxygen)

General entity declarations

<!ENTITY
name
one of
  • replacement string
  • external identifier and optional keyword
>
Examples (in Oxygen)

Notation declarations

<!NOTATION
name
external identifier
>
Examples (in Oxygen)

Parameter entity declarations

<!ENTITY
%
name
external identifier
optional ‘internal subset’
>
Examples (in Oxygen)

Examples

Play stump the chump (time permitting)?

SGML DTDs

Some constructs you'll see.

What got left out of XML DTDs

Several SGML DTD constructs support features omitted from XML.
  • SHORTREF (<!SHORTREF foo "bar" baz>, <!USEMAP foo barracuda>)
  • RANK
  • and-connector (&)
  • case folding of element, attribute names

What got trimmed out of XML DTDs

Several SGML DTD constructs were trimmed down in XML.
  • default entities (entity name #DEFAULT)
  • tag omissibility (<!ELEMENT list-item - O (#PCDATA | %phrase;)*>
  • attribute types (NUMBER, NUMBERS, NUTOKEN, NUTOKENS)
  • inclusion and exclusion exceptions

Assignments

Due: Sunday morning 23 September 2012.
1 Do an HTML 2.0 version of your document. (We'll need to make sure everyone knows where the DTD is.)
2 Do a version of your document using an XML version (to be supplied) of the DTD in ISO 8879 Annex E.
3 Propose a topic for a term paper or major project, to be presented to the group later in the term. (Goal: have these finalized by end of 2 October.)