module graphemes
/* Experiments with distinctive-feature analysis for graphemes.
Given some set of graphemes (e.g. the twenty-six characters of the
lower-case English alphabet), and some set of binary (or n-ary)
distinctive features (e.g. ASCENDER, DESCENDER, BOWL, OPEN, CROSS,
...), together with information about the values of particular
features for particular graphemes (ASCENDER is true for b, d, f, h, k,
l, and maybe t; DESCENDER is true for g, j, p, q, at least in the
simple grade-school script I'm taking as my guide), we'd like to
establish which sets of features suffice to distinguish the graphemes
from each other, and also which sufficient sets could be reduced to
smaller sets by eliminating features.
It's not clear that this is the best way to perform this work, but it
seems like an interesting approach. */
/* This module describes the abstractions; the sets of graphemes being
analysed and the features being proposed should go in a separate
module. */
/* We assume we have things called Graphemes, and things called
(distinctive) Features.
A common way to defining -emic units, given a set of features, is to
specify for each unit (here, for each grapheme) which binary features
it has, or (if we have any non-binary features) its value for each
relevant feature. But for our immediate purposes here it's more
convenient to point the other way: for a given feature, we specify
three sets: the graphmes that have the feature (sic), those that lack
it (non), and those for which we don't know, or it doesn't matter
(unk). */
abstract sig Grapheme {}
abstract sig Feature {
sic : set Grapheme,
non : set Grapheme,
unk : set Grapheme
}{
// for any feature f, the sets f.sic, f.non, f.unk are disjoint
no (sic & non)
no (sic & unk)
no (non & unk)
// for any feature f, every grapheme is in one of the three sets
(sic + non + unk) = Grapheme
}
/* We want to be able to say, for each grapheme, what feature-value
pairs it has. So we need things called values. For the moment, we
assume three values: pos (positive, yes, true, +), neg (negative, no,
false, -), and unk (unknown, unspecified, does not apply, don't care,
...). */
abstract sig Value {}
one sig pos, neg, unk extends Value {}
/* The function fv[G, F] returns the value of a given feature, for a
given grapheme. */
fun fv[g : Grapheme, f : Feature] : Value {
(g in f.sic) => pos
else (g in f.non) => neg
else unk
}
/* The predicate indistinct[FS, G1, G2] is true for a given feature
set FS and two graphemes G1 and G1 if and only if every feature in FS
has the same value for G1 and G2. */
pred indistinct[fs : set Feature, g1, g2 : Grapheme] {
all f : fs | fv[g1,f] = fv[g2,f]
}
/* The function clump[G, FS] returns the set of all graphemes
indistinguishable from grapheme G given feature set FS. */
fun clump[g : Grapheme, fs : set Feature] : set Grapheme {
{ g2 : Grapheme | indistinct[fs, g, g2] }
}
/* The function fset[G] returns then set of features for which
grapheme G has the value 'pos'. */
fun fset[g : Grapheme] : set Feature {
{ f : Feature | g in f.sic }
}
/* The predicate show_indistinct[G, GS] is true if the set of
graphemes GS indistinct from grapheme G has more than one member. (If
it has only one member, that will be G itself.) Given a set of
graphemes and a set of features to which we are adding, one by one, we
can run this repeatedly while defining new features; each time we run
it, the Alloy analyser will show us a set of graphemes not
distinguished by the features thus far defined. When it shows no
instances, we have a set of features that allows us to distinguish all
of the graphemes defined. */
pred show_indistinct[g : Grapheme, gs : set Grapheme] {
gs = clump[g, Feature]
#gs > 1
}
run show_indistinct
/* The predicate distinctive[FS, GS] is true for a given set of
graphemes if and only if for any two graphemes G1 and G2 in GS, there
is at least one feature F in FS for which G1 and G2 have different
values. */
pred distinctive[fs : set Feature, gs : set Grapheme] {
all disj g1, g2 : gs | some f : fs |
fv[g1,f] != fv[g2,f]
}
// Revisions:
// 2017-04-28 : CMSMcQ : added some commentary
// 2010-03-26 : CMSMcQ : made file