graphic with four colored squares
Quebec flag

What is transcription?

Part 2

C. M. Sperberg-McQueen

Claus Huitfeldt

Yves Marcoux

Digital Humanities 2009, 24 June 2009



Transcriptions and assertions

Assertions model 1

Assertions model 2

Assertions model 3

The document level

How documents say things

Tokens and types

Formalizations (in Alloy)

Types have identity, but we specify no other properties for them.
abstract sig Type {}
Tokens map to types.
abstract sig Token {
 type : Type
Basic tokens (characters, lexical items) map to basic types.
sig Basic_Type extends Type {}
sig Basic_Token extends Token {}{
  type in Basic_Type 


Regions are a kind of token; they may have a set or sequence of subregions.
abstract sig Region extends Token {
  subregions : set Region
  type in S_Unit
  type.children = subregions.@type

sig Ordered_Region extends Region {
 sub_seq : seq Region
  elems[sub_seq] = subregions
  type in Ordered_S_Unit
  type.ch_seq = sub_seq.@type

sig Unordered_Region extends Region {}{
  type in Unordered_S_Unit

Structural units

Regions are realizations / instantiations of structural units, which are similarly ordered or unordered.
abstract sig S_Unit extends Type {
  kind : lone Kind,
  props : set AVPair,
  children : set S_Unit

sig Ordered_S_Unit extends S_Unit {
  ch_seq : seq S_Unit
  elems[ch_seq] = children

sig Unordered_S_Unit extends S_Unit {}

Token and type sequences

The lowest level of region is a sequence of basic tokens; the lowest level of structural unit is a sequence of basic types (a text flow).
sig Text_Flow extends S_Unit {
  types : seq Basic_Type
  kind = PCData
  no children 
sig Token_Sequence extends Region {
  tokens : seq Basic_Token
  type in Text_Flow
  type.types = tokens.@type
  no subregions

Example: Büchner's Woyzeck

Büchner's Woyzeck

Büchner's Woyzeck

Büchner's Woyzeck

Büchner's Woyzeck

Büchner's Woyzeck

Büchner's Woyzeck

Büchner's Woyzeck

Sentential form

(∃ d : Document)
(∃ r1, r2, ... : Region)
(∃ ts01, ts02, ... : Token_Sequence)
(∃ k00, k01, k02, k03, k04, k05, k06 : Basic_Token)
(∃ s1, s2, ... : Structural_Unit)
(∃ f1, f2, ... : Text_Flow)
(∃ t_a, t_A, t_b, t_B, t_c, ..., t_e, ... t_k, ... t_o, ... t_W, ... t_y, t_z, ... : Basic_Type)
     ∧ token_type(k01,t_o)
     ∧ token_type(k02,t_y)
     ∧ token_type(k03,t_z)
     ∧ token_type(k04,t_e)
     ∧ token_type(k05,t_c)
     ∧ token_type(k06,t_k)
     ∧ ts_tokens(ts01, 〈 k00, k01, k02, k03, k04, k05, k06 〉 )
     ∧ token_type(ts01, f1)
     // N.B. implies tf_types(f1, "Woyzeck")
     // “There are seven tokens in d which fit together in a sequence, and which spell out the text flow ‘Woyzeck’.”

     ∧ ordreg_subregions(r1, 〈 ts01 〉 )
     ∧ osu_children(s1, 〈 f1 〉)
     ∧ token_type(r1, s1)
     ∧ su_kind(s1, speaker_attribution)
     // “The text flow ‘Wozzeck’ is a speaker attribution” ...

When the world was young and simple

In the Garden of Eden,
  • Tokens are easily distinguished from other marks.
  • Each token (at any level) maps to exactly one type.
  • Each token sequences has exactly one sequence over its tokens.
  • Each ordered region has exactly one sequence over its subregions.
That is, every document is determinate and univocal.

After the Fall

Some documents are indeterminate. (Cf. Roude, Cayless, Stokes papers.)
The Marburg edition writes Woyzeck.
Franzos transcribed it Wozzeck.


What can happen by chance can also be willed.
An ‘inversion’ by Scott Kim.
And hypertext? Don't get us started.

Modeling indeterminacy and equivocation

The readings level

So much for the document level.
Now for the readings level.

Relativizing contradictions

Contradictions → paralysis.
An evasion: from
Region r4 reads ‘Wozzeck’ and region r4 reads ‘Woyzeck’.
F says that region r4 reads ‘Wozzeck’ and M says that region r4 reads ‘Woyzeck’.
(Escape to the meta-level.)

Formalizing readings

At the document level, we had
sig Document {
 r : Region,
 s : S_Unit

Formalizing readings

At the readings level, we have
sig Document {
 r : Region,
 s : S_Unit
sig Reading {
 d : Document,
 r : Region,
 s : S_Unit
With the meaning ‘At the document level, reading R claims that document d has top-level region (token) r mapping to top-level structural unit (type) s’.

Reading E in absentia

How do we get from
  • a reading of a transcription
  • a reading of the exemplar
Two ways for transcriptions to convey readings of exemplars:
  • re-instantiation (copying): T instantiates exactly the same types as E.
  • level-splitting: T is not a copy of E but a recipe for making a reading of E.*

Concluding remarks

Topics for further work


Today is Québec National Day.
Bonne fête, Québec!