graphic with four colored squares
Quebec flag

What is transcription?

Part 2

C. M. Sperberg-McQueen

Claus Huitfeldt

Yves Marcoux

Digital Humanities 2009, 24 June 2009

http://www.blackmesatech.com/2009/06/dh2009


Introduction

Overview

Transcriptions and assertions

Assertions model 1

Assertions model 2

Assertions model 3

The document level

How documents say things

Tokens and types

Formalizations (in Alloy)

Types have identity, but we specify no other properties for them.
abstract sig Type {}
Tokens map to types.
abstract sig Token {
 type : Type
}
Basic tokens (characters, lexical items) map to basic types.
sig Basic_Type extends Type {}
sig Basic_Token extends Token {}{
  type in Basic_Type 
}

Regions

Regions are a kind of token; they may have a set or sequence of subregions.
abstract sig Region extends Token {
  subregions : set Region
}{ 
  type in S_Unit
  type.children = subregions.@type
}

sig Ordered_Region extends Region {
 sub_seq : seq Region
}{
  elems[sub_seq] = subregions
  type in Ordered_S_Unit
  type.ch_seq = sub_seq.@type
}

sig Unordered_Region extends Region {}{
  type in Unordered_S_Unit
}

Structural units

Regions are realizations / instantiations of structural units, which are similarly ordered or unordered.
abstract sig S_Unit extends Type {
  kind : lone Kind,
  props : set AVPair,
  children : set S_Unit
}

sig Ordered_S_Unit extends S_Unit {
  ch_seq : seq S_Unit
}{
  elems[ch_seq] = children
}

sig Unordered_S_Unit extends S_Unit {}

Token and type sequences

The lowest level of region is a sequence of basic tokens; the lowest level of structural unit is a sequence of basic types (a text flow).
sig Text_Flow extends S_Unit {
  types : seq Basic_Type
}{
  kind = PCData
  no children 
}
sig Token_Sequence extends Region {
  tokens : seq Basic_Token
}{
  type in Text_Flow
  type.types = tokens.@type
  no subregions
}

Example: Büchner's Woyzeck

Büchner's Woyzeck

Büchner's Woyzeck

Büchner's Woyzeck

Büchner's Woyzeck

Büchner's Woyzeck

Büchner's Woyzeck

Büchner's Woyzeck

Sentential form

(∃ d : Document)
(∃ r1, r2, ... : Region)
(∃ ts01, ts02, ... : Token_Sequence)
(∃ k00, k01, k02, k03, k04, k05, k06 : Basic_Token)
(∃ s1, s2, ... : Structural_Unit)
(∃ f1, f2, ... : Text_Flow)
(∃ t_a, t_A, t_b, t_B, t_c, ..., t_e, ... t_k, ... t_o, ... t_W, ... t_y, t_z, ... : Basic_Type)
token_type(k00,t_W)
     ∧ token_type(k01,t_o)
     ∧ token_type(k02,t_y)
     ∧ token_type(k03,t_z)
     ∧ token_type(k04,t_e)
     ∧ token_type(k05,t_c)
     ∧ token_type(k06,t_k)
     ∧ ts_tokens(ts01, 〈 k00, k01, k02, k03, k04, k05, k06 〉 )
     ∧ token_type(ts01, f1)
     // N.B. implies tf_types(f1, "Woyzeck")
     // “There are seven tokens in d which fit together in a sequence, and which spell out the text flow ‘Woyzeck’.”

     ∧ ordreg_subregions(r1, 〈 ts01 〉 )
     ∧ osu_children(s1, 〈 f1 〉)
     ∧ token_type(r1, s1)
     ∧ su_kind(s1, speaker_attribution)
     // “The text flow ‘Wozzeck’ is a speaker attribution” ...

When the world was young and simple

In the Garden of Eden,
  • Tokens are easily distinguished from other marks.
  • Each token (at any level) maps to exactly one type.
  • Each token sequences has exactly one sequence over its tokens.
  • Each ordered region has exactly one sequence over its subregions.
That is, every document is determinate and univocal.

After the Fall

Some documents are indeterminate. (Cf. Roude, Cayless, Stokes papers.)
The Marburg edition writes Woyzeck.
Franzos transcribed it Wozzeck.

Equivocation

What can happen by chance can also be willed.
An ‘inversion’ by Scott Kim.
And hypertext? Don't get us started.

Modeling indeterminacy and equivocation

The readings level

So much for the document level.
Now for the readings level.

Relativizing contradictions

Contradictions → paralysis.
An evasion: from
Region r4 reads ‘Wozzeck’ and region r4 reads ‘Woyzeck’.
make
F says that region r4 reads ‘Wozzeck’ and M says that region r4 reads ‘Woyzeck’.
(Escape to the meta-level.)

Formalizing readings

At the document level, we had
sig Document {
 r : Region,
 s : S_Unit
}

Formalizing readings

At the readings level, we have
sig Document {
 r : Region,
 s : S_Unit
}
sig Reading {
 d : Document,
 r : Region,
 s : S_Unit
}
With the meaning ‘At the document level, reading R claims that document d has top-level region (token) r mapping to top-level structural unit (type) s’.

Reading E in absentia

How do we get from
  • a reading of a transcription
to
  • a reading of the exemplar
?
Two ways for transcriptions to convey readings of exemplars:
  • re-instantiation (copying): T instantiates exactly the same types as E.
  • level-splitting: T is not a copy of E but a recipe for making a reading of E.*

Concluding remarks

Topics for further work

Merci

Today is Québec National Day.
Bonne fête, Québec!