Writing Sound Changes With Lexurgy SC¶
This page is a tour through the features of Lexurgy SC. Start here if you’re new to Lexurgy.
Once you’re comfortable with the basics, the Cheat Sheet is a handy reference.
As you read, be sure to try out each example in the Lexurgy SC web app. Full example files can be found at Examples page.
Basic Structures¶
Writing Rules¶
Enter the following into the “Sound Changes” box:
k-before-a:
k => ʃ / _ a
This rule changes the sound [k] to the sound [ʃ], but only immediately before the sound [a]. If you put some example words (e.g. “koka” and “sisa”) in the “Input Words” box and click “Apply”, you should see that “koka” turns into “koʃa”, while words that don’t match the rule at all, like “sisa”, don’t change.
All rules need to have the =>
symbol to separate the match side of
the expression (k
in the example), which indicates what to change,
from the result side of the expression (ʃ
in the example),
which indicates what it should change into. The example also has
an environment (_ a
), meaning “before the sound [a]”; the underscore
represents the location of the changing sounds.
If you want an unconditional change — one that doesn’t depend on neighbouring sounds — you can leave out the environment. This is an unconditional sound change:
caught-cot-merger:
ɔ => ɑ
This change turns all [ɔ] sounds into [ɑ], wherever they appear in the word.
Note
I’ve indented the examples on this page to make them easier to read, but Lexurgy doesn’t actually care about indentation. Your rules will still work even if all the lines are left-justified.
Special Characters¶
Lexurgy works in Unicode, so it can seamlessly handle IPA characters and other special characters. You just need a way to enter them: either copy them from a site like this, or install an IPA keyboard layout like this one.
While Lexurgy handles exotic characters with ease, it’ll have trouble if you try to write rules using characters like “+”, “$”, “!”, and other characters that mean something special to Lexurgy. If you need to use one of these characters as a phonetic symbol (say you’re making a click language and want to use “!”), you have to put a backslash in front of it to tell Lexurgy that you mean a literal [!]:
click-loss:
\! => t
Using Digraphs¶
If you want to use digraphs in your rules, you can declare symbols:
Symbol ts, dz
Then rules that affect t or s will ignore them if they’re part of a ts, and similarly with dz.
Combining Similar Rules with Alternative Lists¶
Often, several sounds are affected by a change, or trigger it. You could write a bunch of separate rules listing each possibility, but to make this faster and easier, Lexurgy provides alternative lists.
For example, it’s common to have a change happen only between vowels. In a language with the classic five-vowel system, that would be 25 combinations, which would be very tedious to write out in full. Instead of listing them out, you use alternative lists, like this:
rhotacization:
s => r / {a, e, i, o, u} _ {a, e, i, o, u}
You can also use alternative lists in the input and output parts of the rule. If only the input is an alternative list, all of the sounds in it will change into the single new sound:
glottal-stop:
{p, t, k} => ʔ / {a, e, i, o, u} _ {a, e, i, o, u}
If both are alternative lists, they must be the same length, and each sound in the old list will change into the corresponding sound in the new list:
intervocalic-lenition:
{p, t, k} => {b, d, ɡ} / {a, e, i, o, u} _ {a, e, i, o, u}
This will turn [p] into [b], [t] into [d], and [k] into [ɡ].
It’s invalid to have a single sound turn into an alternative list, or to have a list of one length turn into a list of a different length.
The alternative list mechanism can also be used to specify multiple environments. For example, this rule converts stops into fricatives both between vowels and before other stops:
frication:
{p, t, k} => {f, θ, x} / {{a, e, i, o, u} _ {a, e, i, o, u}, _ {p, t, k}}
Note the two underscores, one in each environment. Alternative environments are the only time you can have multiple underscores in the same expression!
Sound Classes¶
If you find yourself using the same alternative lists over and over again, you can give them convenient names using sound classes. The above examples would be much less repetative if we defined these classes:
Class vowel {a, e, i, o, u}
Class unvcdstop {p, t, k}
Class vcdstop {b, d, ɡ}
Now we can write intervocalic lenition like this:
intervocalic-lenition:
@unvcdstop => @vcdstop / @vowel _ @vowel
You need to include the @ sign to make it clear that you’re talking about the class called vowel, not the literal sounds /vowel/.
You can use the names of previously defined classes in your class definitions:
Class stop {p, t, k}
Class fricative {f, s}
Class obstruent {@stop, @fricative}
Now the obstruent
class expands to {p, t, k, f, s}
. You can
use this to build up larger and larger classes of sounds
without repeating yourself.
Lexurgy’s sound classes should be familiar to users of Rosenfelder’s SCA.
Word Boundaries¶
You can specify that a rule only applies at the beginning or end of a word by marking the word boundary with a dollar sign:
aspirate-initial-stop:
{p, t, k} => {pʰ, tʰ, kʰ} / $ _
final-vowel-to-schwa:
{a, e, i, o, u} => ə / _ $
The first applies only at the beginning of a word, the second at the end of a word.
Deleting and Inserting Sounds¶
If you want to delete a sound entirely, put an asterisk in the result side:
drop-final-vowel:
{a, e, i, o, u} => * / _ $
Similarly, you can add epenthetic sounds by putting an asterisk on the match side and specifying the environment where the sound should appear:
spanish-e:
* => e / _ s {p, t, k}
When using asterisks on the match side, be sure to specify a condition! Not using a condition causes the sound to be inserted everywhere, turning e.g. scola into eseceoeleae — probably not what you want!
Exceptions¶
Sometimes it’s easier to say when a change doesn’t occur than when it does. You can specify exceptions to a rule using a double slash:
final-vowel-loss:
e => * / _ $ // {p, t, k} _
This rule drops a final [e], except after a voiceless stop.
Wildcards¶
The symbol []
matches any single sound. It’s often useful for
“skipping” a certain number of sounds. For example, the following
rule deletes schwas, but not in the last two sounds of a word:
drop-schwas:
ə => * / _ [] []
It works by requiring there to be two sounds (any two sounds) after the schwa.
Comments¶
Any line that starts with #
is a comment, and Lexurgy will ignore it.
Comments can help you remember why you wrote your sound changes a certain
way, especially if you had to use a complicated rule that’s hard to
understand just by looking at it.
You can put comments on their own, inside rules, or even at the ends of lines:
# These rules palatalize k to s before i in three steps.
palatalization-1:
k => tʃ / _ i
palatalization-2:
# This is step 2!
tʃ => ʃ
palatalization-3:
ʃ => s # This is step 3!
Basican¶
What you’ve seen so far is enough to understand the Basican example.
You can do a lot with just the basic structures, but they barely scratch the surface of what Lexurgy offers.
Intermediate Structures¶
Multiple-Segment Rules¶
A rule can affect a sequence of consecutive sounds at the same time. For example, this rule implements compensatory lengthening when a coda stop consonant is lost:
Class consonant {p, t, k, s, m, n, l}
coda-stop-drop-and-lengthen:
{a, e, i, o, u} {p, t, k} => {aː, eː, iː, oː, uː} * / _ @consonant
Lexurgy lines up {a, e, i, o, u}
with {aː, eː, iː, oː, uː}
and
{p, t, k}
with *
, and applies both changes.
In most cases, the number of elements must be the same on each side of the =>
.
If a change adds or deletes some of the sounds, fill in the missing spaces
with asterisks.
Simultaneous Expressions¶
You can put multiple expressions in a single rule:
Class nasal {m, n}
post-nasal-lenition:
{b, d, ɡ} => * / @nasal _
{p, t, k, f, x} => {b, d, ɡ, h, h} / @nasal _
When a rule has several expressions, each expression runs simultaneously, so later expressions don’t see the results of earlier expressions. This means we could actually reverse the order of the expressions above without changing the result:
post-nasal-lenition:
{p, t, k, f, x} => {b, d, ɡ, h, h} / @nasal _
{b, d, ɡ} => * / @nasal _
Even though the first expression changes [p] into [b], this isn’t visible to the second expression, so the second expression won’t delete the resulting [b]’s.
The order of expressions only matters if two expressions try to change the same part of the word; in these situations, the earliest expression takes precedence. This can be useful for making rules that do one thing in most cases, and another thing in some exceptional case:
k-shift:
k => s / _ {e, i}
k => h / $ _
The first expression
blocks the second from changing k
to h
before e
and i
(by changing it to s
instead).
Compound Rules¶
You can combine two or more rules under one name by putting Then:
between the expressions. For example, the palatalization rule from
above could be rewritten as:
palatalization:
k => tʃ / _ i
Then: tʃ => ʃ
Then: ʃ => s
If you put more than one expression between each Then:
, then those
expressions run simultaneously, just like in an ordinary rule.
Compound rules help keep the sound changes organized by grouping related changes together under one name.
Romanization¶
It’s a good idea to do all the sound changes in phonetic notation (e.g. IPA). But you probably do most of the work for your languages in their romanization systems. Lexurgy has conventions for converting between IPA and romanizations so that you don’t confuse romanization rules with actual sound changes.
If the input language has a romanization system, the first rule should
be called “deromanizer” and contain all of the rules for converting from
the input language’s romanization to phonetic notation. If you make this
a compound rule (by separating expressions with Then:
), you can do
complex sequences of transformations within a single “deromanizer” rule.
Similarly, if the output language has a romanization system, the last rule should be called “romanizer” and contain all of the rules for converting from phonetic language to the output language’s romanization.
Intermediate Romanizers¶
If you want to preserve the history of the language at several stages, you can use intermediate romanizers. Any rule whose name starts with “romanizer-” will be treated as an intermediate romanizer.
Intermediate romanizers differ from ordinary rules in an important way: subsequent rules don’t see the changes they make. Instead, their results are included in the output along with the final form of the word. Let’s revisit the three-stage palatalization rule, but put in some intermediate romanizers:
palatalization-1:
k => tʃ / _ i
romanizer-old-examplish:
tʃ => ch
palatalization-2:
tʃ => ʃ
romanizer-middle-examplish:
ʃ => sh
palatalization-3:
ʃ => s
Suppose you pass the word “kinoki” to these sound changes. It will go through the changes as follows:
The rule
palatalization-1
executes, turning “kinoki” into “tʃinotʃi”.The rule
romanizer-old-examplish
executes, turning “tʃinotʃi” into “chinochi”.The rule
palatalization-2
executes, but it still sees “tʃinotʃi”; it turns this into “ʃinoʃi”.The rule
romanizer-middle-examplish
executes, turning “ʃinoʃi” into “shinoshi”.The rule
palatalization-3
executes, but it still sees “ʃinoʃi”; it turns this into “sinosi”.
The output of this sound changer would look something like this:
kinoki => chinochi => shinoshi => sinosi
If the intermediate “romanizer” should just dump the phonetic forms, you can use the special rule “unchanged”:
romanizer-phonetic:
unchanged
Note
For the command-line tool, you need to specify the -m
command-line argument
in order for intermediate romanizers to activate.
Using Features¶
We saw above how using alternative lists and sound classes helps make sound changes less repetitive and more compact. But Lexurgy provides another way of doing this: features. Features take more effort to set up, but they allow certain kinds of rules, like assimilation and stress rules, to be written much more simply than would be possible with sound classes.
Binary Features¶
Let’s look at an example of a simple vowel space defined using binary features, as used in distinctive feature theory:
Feature low, high
Feature front, back
Symbol a [+low -high -front -back]
Symbol e [-low -high +front -back]
Symbol i [-low +high +front -back]
Symbol o [-low -high -front +back]
Symbol u [-low +high -front +back]
This example defines four features: low
, high
, front
,
and back
, representing the position of the tongue
in the mouth when making that vowel. Each of these features has two values: low
has the value +low
for low vowels like [a], and the
value -low
for non-low vowels. We then define
a symbol for each vowel we want to use, and give each
vowel a feature matrix, showing which feature values
that vowel has. Now, anytime Lexurgy encounters an [e]
in a word, it knows that that [e] is a front vowel, but
not a low, high, or back vowel.
With these definitions, we can write a rule like this:
final-vowel-raising:
[-low -high] => [+high] / _ $
This rule says that any mid vowel (non-low, non-high) at
the end of a word becomes the corresponding high vowel:
[e] becomes [i], and [o] becomes [u]. The matrix [+high]
on the result side of the rule means that the high
feature, and only the high
feature, will be changed
to +high
, while all other features (like the front
and back
features) are left unchanged.
Note
Each symbol must have a distinct matrix — you can’t define both
[ɛ] and [e] as just [-low -high +front -back]
. You have to
add some kind of distinguishing feature; an ATR
feature could be used to distinguish these two sounds.
Absent Values¶
Binary features actually have a third value: absent,
written with a *
before the feature name. (This
makes the name “binary” rather suspect… sorry, I didn’t
invent the terminology!) So the
absence of the low
feature is written *low
and
the absence of the back
feature is written *back
.
Any character that you haven’t explicitly given a value
for the low
feature automatically has *low
.
This would most likely be the case with any consonant
sounds, where the low
feature is irrelevant.
You can use absent features in rules just like any other feature value.
Note
Any characters in an input word that don’t match symbols are considered to
lack all features, so they’ll only match matrices consisting entirely
of absent features, like [*low *front]
, or the empty matrix []
.
Univalent Features¶
You can also define univalent features, which only have two values: present and absent.
The following defines two univalent features:
Feature +nasalized
Feature +stress
With these definitions, the feature stress
has the two values
+stress
and -stress
. Any sound that
isn’t explicitly declared to be +stress
is
automatically -stress
. There’s no separate *stress
value; -stress
is the absent value.
Univalent features are convenient for suprasegmentals
like stress, because it would be annoying to have
to declare every single vowel to be [-stress]
.
Multivalent Features¶
Lexurgy differs from tools like Phonix (and from distinctive feature theory) by allowing features with any number of values. For example, you can recreate the IPA consonant chart like this:
Feature voicing(unvoiced, voiced)
Feature place(labial, dental, alveolar, velar, glottal)
Feature manner(stop, fricative, nasal, approximant)
Symbol p [unvoiced labial stop]
Symbol b [voiced labial stop]
Symbol t [unvoiced dental stop]
Symbol d [voiced dental stop]
Symbol k [unvoiced velar stop]
Symbol ɡ [voiced velar stop]
Symbol f [unvoiced labial fricative]
Symbol v [voiced labial fricative]
Symbol ð [voiced dental fricative]
Symbol s [unvoiced alveolar fricative]
Symbol z [voiced alveolar fricative]
Symbol x [unvoiced velar fricative]
Symbol ɣ [voiced velar fricative]
Symbol h [unvoiced glottal fricative]
Symbol m [labial nasal]
Symbol n [alveolar nasal]
Symbol l [alveolar approximant]
This defines three features, voicing
, place
and manner
, with
two, five, and four values respectively. With multivalent features, each value has a name;
rather than writing [+place]
or [-manner]
, which wouldn’t make
sense, you have to use the names, like [labial nasal]
.
With these definitions, you can write rules like this:
Class vowel {a, e, i, o, u}
intervocalic-lenition:
[unvoiced stop] => [voiced] / @vowel _ @vowel
[voiced stop] => [fricative] / @vowel _ @vowel
[unvoiced fricative] => h / @vowel _ @vowel
h => * / @vowel _ @vowel
Just like binary and univalent features, multivalent features always
have an absent value. In this example, we didn’t specify a voicing for [m], [n], and [l],
so they automatically have the absent value *voicing
.
Feature Variables¶
Languages often undergo assimilation, where one sound becomes more like a nearby sound. Lexurgy helps in writing assimilation rules by allowing feature variables, which copy a feature value from one sound to another.
For example, suppose you have these declarations:
Feature type(*vowel, consonant)
Feature place(labial, alveolar, velar, glottal)
Feature manner(stop, fricative, nasal, approximant)
Symbol p [labial stop consonant]
Symbol t [alveolar stop consonant]
Symbol k [velar stop consonant]
Symbol s [alveolar fricative consonant]
Symbol m [labial nasal consonant]
Symbol n [alveolar nasal consonant]
Symbol ŋ [velar nasal consonant]
Symbol l [alveolar approximant consonant]
Then you can write the common nasal assimilation rule like this:
nasal-assimilation:
[nasal] => [$place] / _ [consonant $place]
The [consonant $place]
matrix in the environment matches any consonant, but captures the
value of that consonant’s place
feature. This feature value is copied into the matching
$place
in the output matrix. So a nasal before a [p] would have the matrix [labial]
applied to it and become an [m], a nasal before a [d] would have the matrix [alveolar]
applied and become an [n], etc.
Warning
Beware of matrices containing only feature variables, like [$place]
or
[$low $front]
, in the match or environment portion of a rule. Feature
variables copy the absent value too, so these matrices will match
anything and blithely copy absent values where they don’t belong. For
example, it’s tempting to write the above nasal assimilation rule like this:
[nasal] => [$place] / _ [$place]
But this rule will apply to a word like “nato”, copying the implicit *place
value from the [a] onto the [n] and resulting in a nasal with no place of
articulation. If Lexurgy complains with an error message like
“No combination of a symbol and diacritics has the matrix [nasal]”,
it’s probably run into exactly this problem!
Giving the Absent Value a Name¶
You can give the absent value of any multivalent feature a name. This declaration
allows unstressed
to be used instead of *stress
to indicate a lack
of both primary and secondary stress:
Feature stress(*unstressed, primary, secondary)
Even though it has a name, unstressed
still behaves like an absent value;
all sounds that don’t have stress indicated explicitly are unstressed
.
Negated Features¶
A feature value in a matrix can be negated by prefixing it with !
. Then the matrix
will match any sound that doesn’t have that value. For example, [stop !glottal]
will match any stop except the glottal stop, while [vowel front !high]
will match
non-high back vowels.
You can’t use negated features on the result side of a rule; if you try to transform
a sound to [stop !glottal]
, Lexurgy doesn’t know what you want
the place of articulation to be (is it alveolar
? velar
?), only what you want
it not to be.
Diacritics¶
The IPA indicates some features explicitly using diacritics: [ʰ] indicates aspiration, [ː] makes a vowel long, and [ ̥ ] makes a sound voiceless.
You can declare these in Lexurgy like this:
Feature +aspirated, +long, voiced
Diacritic ʰ [+aspirated]
Diacritic ː [+long]
Diacritic ̥ [-voiced]
If these diacritics appear in the input words or in rules, Lexurgy will
consider them to add the specified feature value to the modified sound, replacing
the existing value of that feature. For example, if
[p] is [-voiced bilabial stop]
, then [pʰ] is [+aspirated -voiced bilabial stop]
;
if [n] is [+voiced alveolar nasal]
, then [n̥] is [-voiced alveolar nasal]
.
Diacritics also work when translating matrices back into symbols: if a rule produces
[-voiced alveolar nasal]
, and there’s no symbol explicitly defined with that matrix,
Lexurgy will search through possible combinations of symbols and diacritics to find
one that fits the matrix, namely [n̥].
If you add (before)
to a diacritic declaration (before or after the
matrix), it will go before the base symbol. For example, if you define
Diacritic ⁿ (before) [+prenasalized]
(or Diacritic ⁿ [+prenasalized] (before)
),
then the prenasalized version of [d] will show up as ⁿd
rather than dⁿ
.
Similarly, if you add (first)
to a diacritic declaration,
it will go after the first character of the symbol. This is
useful for applying diacritics to diphthongs: applying
́
to the symbol aj
will produce the undesirable aj́
. Declaring
it (first)
will produce áj
instead. (On single-character symbols,
“first” diacritics act the same as “after” diacritics.)
Diacritics can even be applied to symbols that aren’t declared with feature matrices, in which case you can change the diacritics using matrix rules but not the base symbol. For example, this rule will turn the sequence [ar] into [aː], even without a symbol definition for [a]:
Feature +long
Diacritic ː [+long]
a-before-r:
a r => [+long] *
But if you wanted to change the a
into a different vowel using matrix
rules, you would have to declare it as a symbol with a feature matrix.
Floating Diacritics¶
Some diacritics indicate suprasegmentals or other features that aren’t integral to the sound. While most languages would treat [p] and [pʰ] as entirely different sounds (if they’re distinguished at all), a feature like stress or tone is added on top of a vowel sound without affecting its nature much. As a result, most sound changes should ignore the feature.
You can indicate this by making the diacritic floating:
Feature +hightone, +stress
Diacritic ˈ (floating) [+stress]
Diacritic ́ (floating) [+hightone]
Literal sounds without floating diacritics match sounds with or without floating diacritics, and transmit any floating diacritics unaltered to the output. For example, suppose that we write this rule:
mid-raising:
{e, o} => {i, u}
This will turn kepo
into kipu
, but also keˈpó
into kiˈpú
.
On the other hand, literal sounds with the floating diacritic only match sounds with the diacritic. Suppose we write this rule instead:
mid-raising:
{eˈ, oˈ} => {iˈ, uˈ}
This rule will leave kepo
unaltered because the vowels aren’t stressed, but will turn
keˈpó
into kiˈpó
.
If you really want a literal sound without floating diacritics to only accept exact matches,
put !
after the sound:
mid-raising:
{e!, o!} => {i, u}
This will turn kepo
into kipu
, but leave keˈpó
unaltered.
Multiple Criteria¶
You can force Lexurgy to match multiple criteria on the same segment using
an intersection, which is notated by joining the criteria with &
.
The most common use of this is when you’re mixing sound classes with features,
and need to specify that a rule only applies when a sound both has a feature
and is in a class:
Feature +stress
Diacritic ˈ (floating) [+stress]
Class vowel {a, e, i, o, u}
unstressed-final-vowel-loss:
@vowel&[-stress] => * / _ $ // {p, t, k} _
If an alternative list is the first element in an intersection, then it can match up with an alternative list of the same length on the result side of the rule. For example:
Feature +stress
Diacritic ˈ (floating) [+stress]
unstressed-vowel-centralizing:
{e, i, o, u}&[-stress] => {ə, ɨ, ə, ɨ}
Optional and Repeated Segments¶
You can mark part of the environment optional by putting a question mark after it:
Feature +stress
Diacritic ˈ (floating) [+stress]
Class vowel {a, e, i, o, u}
Class consonant {p, t, k, s, m, n, l}
Class glide {j, w}
stress-closed-last-syllable:
@vowel => [+stress] / _ @glide? @consonant $
This rule will stress the vowel in a final closed syllable, even if there’s an offglide like [j] or [w] after the vowel.
If the language has a more complex syllable structure though, this won’t be enough; it won’t match a word like [krajsk]. To deal with that case, you can use a repeated segment:
stress-closed-last-syllable:
@vowel => [+stress] / _ @glide? @consonant+ $
The +
indicates that we want at least one consonant at the end of the word.
If the repeated segment is also optional (i.e. the rule should accept zero or more
copies of the segment), you can use *
instead of +
. For example, this
rule will stress the vowel in the last syllable regardless of whether there are
any glides or consonants at the end:
stress-last-syllable:
@vowel => [+stress] / _ {@glide, @consonant}* $
You can get even more specific about the number of allowed repetitions by using numbers after the *. This rule will stress the last syllable only if there are exactly two coda consonants:
stress-last-syllable-picky:
@vowel => [+stress] / _ {@glide, @consonant}*2 $
And this rule will stress the last syllable if there are between one and three coda consonants:
stress-last-syllable-less-picky:
@vowel => [+stress] / _ {@glide, @consonant}*(1-3) $
If an entire sequence of sounds is optional or repeated, put those sounds in parentheses so it’s clear that the repeater applies to all the sounds together, not just the last one. For example, if you wanted to assign stress to the last syllable if it’s open or if there’s both a glide and a consonant in its coda, you could do this:
stress-last-syllable-complicated:
@vowel => [+stress] / _ (@glide @consonant)? $
Warning
Overly complicated combinations of optionals and repeaters can force Lexurgy to try too many possibilities. Take this rule:
silly:
(x+ x+)+ => *
If you apply this rule to a word with a long sequence of x’s
(say, soxxxxxxxxxxxxx
), then Lexurgy doesn’t know how to
divide up the x’s between the repeaters: it could be
[(xxx)(xx)][(xxxxx)(xxx)]
, or [(x)(xx)][(xxx)(xxxx)][(xxx)]
,
or any number of other possible combinations. Instead of running
for a long time and hanging the app, Lexurgy will give up and report
a “too many possibilities” error.
Intermediatese¶
Now you should understand enough to be able to follow along with the Intermediatese example.
With that, it’s time to get to the power features that really make Lexurgy stand out.
Advanced Structures¶
Reusable Elements¶
If you find yourself repeating part of an expression across many rules,
consider declaring it as an Element
.
Element declarations are similar to class declarations, but they can contain anything that you could find in an expression:
Element scluster (s @stop)
Element longorhigh {[+long], [+hightone]}
Once declared, you can reference elements in rules (or other elements),
using @
just like class references:
Feature +long, +hightone, +stress
Diacritic ː (floating) [+long]
Diacritic ́ (floating) [+hightone]
Diacritic ˈ (floating) (before) [+stress]
Class stop {p, t, k}
Element scluster s @stop
Element longorhigh {[+long], [+hightone]}
epenthetic-e:
* => e / _ @scluster
stress:
@longorhigh => [+stress]
decluster-s:
@scluster => * @stop
Note
Most sound classes could equally well be declared as
elements; e.g. Element stop {p, t, k}
does the same thing
as Class stop {p, t, k}
. However, referencing a class in
an element declaration is slightly different from referencing a
class from another class: Class obstruent {@stop, s}
expands to
{p, t, k, s}
, but Element obstruent {@stop, s}
expands to
{{p, t, k}, s}
. This matters if you’re trying to line
up corresponding sounds!
Hierarchical Rules¶
Some kinds of sound changes are best expressed as a hierarchy of rules, where only the first condition that matches matters. The classic example of this is stress rules. For example, a language might have the following stress rule: a word is stressed on the last long vowel if one exists; otherwise on the last closed syllable if one exists; otherwise on the last syllable. To do this with the structures seen so far, you’d have to write something like this:
Feature +stress, +long
Diacritic ˈ (floating) [+stress]
Diacritic ː (floating) [+long]
Class vowel {a, e, i, o, u}
Class cons {p, t, k, s, m, n, l}
assign-stress:
[+long] => [+stress] / _ [-long]* $
Then:
@vowel => [+stress] / $ [-stress]* _ @cons (@cons? @vowel&[-stress])* $
Then:
@vowel => [+stress] / $ [-stress]* _ @cons* $
We have to awkwardly tell the second and third expressions not to apply
if there’s already stress somewhere else in the word. Otherwise, a
word like teːpunsa
will trigger all three expressions in different
syllables, leading to all three syllables being stressed. Not helpfulǃ
Fortunately, there’s a better way. Just use Else:
instead of Then:
:
Feature +stress, +long
Diacritic ˈ (floating) [+stress]
Diacritic ː (floating) [+long]
Class vowel {a, e, i, o, u}
Class cons {p, t, k, s, m, n, l}
assign-stress:
[+long] => [+stress] / _ [-long]* $
Else:
@vowel => [+stress] / _ @cons (@cons? @vowel)* $
Else:
@vowel => [+stress] / _ @cons* $
Now we don’t have to include the extra [-stress]
conditions.
When this rule is applied to a word like teːpunsa
, then as
soon as the first expression matches the [eː], that one sound is
changed and then the whole rule exits, so nothing else gets stressed.
If the first vowel is short, though, then the first expression doesn’t
match anything, so the second expression gets a chance to apply, putting
stress on the [u]. In general, subsequent Else:
rules are
only applied if all the previous ones didn’t match the word.
If you need to use Then:
and Else:
in the same rule, use
parentheses to indicate whether the Then:
is inside the Else:
or vice versa. Here’s a variation on the above rule that restricts
stress to one of the last three syllables using a temporary diacritic:
Feature +stress, +long, +lastThree
Diacritic ˈ (floating) [+stress]
Diacritic ː (floating) [+long]
Diacritic ³ (floating) [+lastThree]
Class vowel {a, e, i, o, u}
Class cons {p, t, k, s, m, n, l}
assign-stress:
@vowel => [+lastThree] / _ @cons* @vowel? @cons* @vowel? @cons* $
Then: (
[+long +lastThree] => [+stress] / _ [-long]* $
Else:
[+lastThree] => [+stress] / _ @cons (@cons? @vowel)* $
Else:
[+lastThree] => [+stress] / _ @cons* $
)
Then:
[+lastThree] => [-lastThree]
Here, the parentheses indicate that the [+lastThree]
is added
before the entire Else:
block, and removed after the entire
Else:
block.
Gemination and Metathesis¶
Sometimes it’s useful to copy an entire sound from one place to another, rather than just a feature. Common cases where copying sounds is useful include gemination (duplication of a sound) and metathesis (switching of sounds).
To capture a sound, put a capture variable immediately after the pattern that matches it.
A capture variable looks like a dollar sign followed by a number: $1
, $2
, etc.
Once a sound has been captured, you can use the capture variable alone to produce or
recognize a copy of the sound.
This rule applies gemination in stop-stop clusters, turning the first stop into a copy of the second:
Class stop {p, t, k}
gemination:
@stop @stop$1 => $1 $1
This rule applies metathesis to stop-fricative sequences:
Class stop {p, t, k}
Class fricative {f, s}
metathesis:
@stop$1 @fricative$2 => $2 $1
This rule uses a capture variable in the environment to recognize a geminate:
Class consonant {p, t, k, s, m, n, l}
geminate-epenthesis:
* => e / _ @consonant$1 $1
This rule uses a bare capture variable on the match side of the rule to remove gemination (degemination):
Class consonant {p, t, k, s, m, n, l}
degemination:
@consonant$1 $1 => $1 *
Warning
Unlike most other elements, captures are picky about floating diacritics
by default. They only match sounds that are exactly the same as the
sound that was originally captured. If you want a capture reference
to ignore floating diacritics, put a ~
before it: ~$1
matches
whatever was captured by $1
plus or minus any floating diacritics.
Negation¶
You can negate rule elements — match only segments that don’t fit
the element — by preceding the element with !
, as with matrix features.
You can do this with all sorts of elements. For example:
!r
matches anything but the sound [r] (or its floating-diacritic variants).!{a, i, u}
matches anything except the sounds [a], [i], and [u].!@vowel
matches anything that isn’t in thevowel
class.![+high +back vowel]
matches anything except a back high vowel; this is different from[!+high !+back vowel]
, which only matches vowels that are neither high nor back (e.g. the former matches the high front vowel [i] while the latter doesn’t).!$
means “except at a word boundary”, e.g./ <stuff> _ <more stuff> !$
is the same as/ <stuff> _ <more stuff> // _ <more stuff> $
.!.
means “but only if there isn’t a syllable break here”, e.g.s !. t
matches the sequence [st] inban.stu
but not inbas.tu
.
However, negating something that might match more than one sound is only allowed in certain
situations. Take !(@consonant @vowel)
. This is supposed to mean “match anything that isn’t
a consonant followed by a vowel”. But what should it match? Should it match [k] on its own?
Should it only match sequences like [kt] and [ap] that consist of two sounds? You might
want different behaviour in different cases.
So something like !(@consonant @vowel)
is only valid if:
It’s on the edge of a condition:
a / _ !(@consonant @vowel)
is the same asa // _ @consonant @vowel
,OR it’s immediately after
&
:([] [])&!(@consonant @vowel)
matches any two segments except for a consonant followed by a vowel.
In general, if you get errors when trying to negate something, specify exactly what
kinds of matches are allowed using &
.
Nested Environments¶
Environments can be nested inside other structures. For example, the following is a compact way of writing the rule “voiced stops word finally, and aspirated stops unconditionally, both become voiceless stops”:
nested-environment:
{{b, d, g} / _ $, {pʰ, tʰ, kʰ}} => {p, t, k}
Rule Filters¶
Some rules only care about certain kinds of sounds, ignoring any intervening sounds. This is most common with rules affecting vowels, such as stress, vowel harmony, and tone.
You can make such rules more concise by defining a filter on the rule. A rule with a filter will treat sounds that don’t match the filter as if they didn’t exist.
For example, a rule that assigns stress to the vowel in the first symbol could be written like this:
Feature +stress
Diacritic ˈ (floating) [+stress]
Class vowel {a, e, i, o, u}
Class consonant {p, t, k, s, m, n, l}
stress-first-syllable:
@vowel => [+stress] / $ @consonant* _
But any consonants before the vowel are actually irrelevant to this rule, so the
@consonant*
in the environment is a distraction. Instead, you can write it like this:
stress-first-syllable @vowel:
[] => [+stress] / $ _
Note that we can use []
on the match side instead of @vowel
because anything
that passes the filter will already be a vowel, so we don’t need to test for vowelhood
again.
Similarly, a short-distance vowel harmony rule could be written like this:
Feature type(*consonant, vowel)
Feature height(low, mid, high)
Feature frontness(front, central, back)
Feature rounded
Symbol a [low central -rounded vowel]
Symbol e [mid front -rounded vowel]
Symbol i [high front -rounded vowel]
Symbol ø [mid front +rounded vowel]
Symbol y [high front +rounded vowel]
Symbol ɤ [mid back -rounded vowel]
Symbol ɯ [high back -rounded vowel]
Symbol o [mid back +rounded vowel]
Symbol u [high back +rounded vowel]
vowel-harmony [vowel]:
[!central] => [$frontness] / [!central $frontness] _
Propagation¶
Notice the word “short-distance” in the description of the previous example. As written, it would only apply vowel harmony one vowel at a time, turning e.g. [sinotehu] into [sinøtɤhy], which isn’t harmonious at all.
When faced with a change that acts over arbitrarily long distances, such as
vowel harmony and stress rules, you can use propagating rules. A propagating
rule is marked by propagate
after the rule name (and after any filter).
Lexurgy will apply propagating rules repeatedly until the word stops changing.
Propagation is all that’s needed to turn the vowel harmony rule into a long-distance rule:
vowel-harmony [vowel] propagate:
[!central] => [$frontness] / [!central $frontness] _
You can also put propagate
on a Then:
block to propagate
only the statements in the block.
Warning
It’s impossible to tell in general whether a propagating rule will ever terminate. So Lexurgy is conservative and stops with an error message if a rule runs a hundred times without settling on a result.
Now you should be able to follow the Advancedish example.
Left-to-Right and Right-to-Left Rules¶
Lexurgy also supports directional variants of propagation.
If you mark a rule as ltr
, then it will be applied exactly once
at each sound in the word, moving from left to right. Rules marked
as rtl
are similar, but move from right to left.
Since the vowel harmony rule from the previous section propagates exclusively from left to right, it could equally well have been written as:
vowel-harmony [vowel] ltr:
[!central] => [$frontness] / [!central $frontness] _
Deferred Rules¶
If you find yourself applying the same sound changes at several
stages of your language’s evolution, consider turning them into
deferred
rules. A deferred rule isn’t applied where it’s declared;
instead, you apply it later as part of other rules.
For example, suppose you have a nasal assimilation rule:
Class nasal {m, n, ŋ}
nasal-assimilation:
@nasal => m / {p, b}
@nasal => n / {t, d}
@nasal => ŋ / {k, g}
Then you go on developing your language, and at some point you have a vowel deletion rule:
Class vowel {a, e, o, i, u}
delete-pretonic-mid-vowels:
{e, o} => * / @vowel []* _ ([]* @vowel)*2 []* $
But this reintroduces nasal-stop clusters at different places
of articulation: your word anepato
becomes anpato
. You
don’t like this, so you apply the nasal assimilation rule again.
Now you have these sound changes:
Class nasal {m, n, ŋ}
Class vowel {a, e, o, i, u}
nasal-assimilation:
@nasal => m / _ {p, b}
@nasal => n / _ {t, d}
@nasal => ŋ / _ {k, g}
delete-pretonic-mid-vowels:
{e, o} => * / @vowel []* _ ([]* @vowel)*2 []* $
nasal-assimilation-again:
@nasal => m / _ {p, b}
@nasal => n / _ {t, d}
@nasal => ŋ / _ {k, g}
That works, but it’s annoying; this entire rule appears twice, cluttering up your sound changes. It also means that if you later find a mistake in this rule, you have to remember to fix the mistake in both places.
You can get rid of this repetition by declaring nasal assimilation as a deferred rule:
nasal-assimilation defer:
@nasal => m / {p, b}
@nasal => n / {t, d}
@nasal => ŋ / {k, g}
Now you can apply this rule whenever you need it using the syntax
:nasal-assimilation
:
Class nasal {m, n, ŋ}
Class vowel {a, e, o, i, u}
nasal-assimilation defer:
@nasal => m / _ {p, b}
@nasal => n / _ {t, d}
@nasal => ŋ / _ {k, g}
initial-nasal-assimilation:
:nasal-assimilation
delete-pretonic-mid-vowels:
{e, o} => * / @vowel []* _ ([]* @vowel)*2 []* $
nasal-assimilation-again:
:nasal-assimilation
This is an improvement, but it could be a trap in the future:
if you later reorder your rules, you might accidentally move
delete-pretonic-mid-vowels
somewhere else and forget to move
nasal-assimilation-again
along with it. Since these two rules
belong together (the second cleans up a mess made by the first),
why not combine them into one rule?
delete-pretonic-mid-vowels:
{e, o} => * / @vowel []* _ ([]* @vowel)*2 []* $
Then:
:nasal-assimilation
As this example illustrates, you can combine references to deferred rules with ordinary expressions.
Deferred rules can contain all the structures normal rules can,
including Then:
and Else:
blocks, filters,
and modifiers like propagate
.
Note
If a deferred rule contains a Then:
or Else:
, any
references to it must be separated from adjacent expressions
by a Then:
or Else:
.
Cleanup Rules¶
Deferred rules can help reduce repetition if you need to apply
the same rule multiple times. But sometimes you don’t need that
much precision; you want something like nasal assimilation to
apply after every rule, to clean up any illicit clusters.
You can do this by declaring nasal-assimilation
to be a
cleanup
rule:
Class nasal {m, n, ŋ}
Class vowel {a, e, o, i, u}
nasal-assimilation cleanup:
@nasal => m / _ {p, b}
@nasal => n / _ {t, d}
@nasal => ŋ / _ {k, g}
delete-pretonic-mid-vowels:
{e, o} => * / @vowel []* _ ([]* @vowel)*2 []* $
Cleanup rules run once when declared, and then again after every subsequent named rule.
If you want a cleanup rule to stop applying at some point, you can turn it off using this syntax:
nasal-assimilation:
off
Interactions Between Words¶
If you write multiple words on a line in the input (separated by spaces), each will be treated as a separate word. For example, if your rules are:
drop-final-t:
t => * / _ $
and your words are:
sit amet
Then the result will be si ame
; the space between the words
is treated as a word boundary. Similarly, rules that look for
a specific environment won’t look across word boundaries.
Affixation¶
When creating a language diachronically, you may want to simulate how grammatical particles turn into affixes. This means you need the words to be treated as separate before the affixation, but as one word after.
You do this with a rule that destroys the space between words, which is
represented by $$
:
glomination:
$$ => *
This can be conditioned on sounds in the neighbouring words. The following will only join words if the first word ends in a [t] or the second word starts with an [s]:
selective-glomination:
$$ => * / t _
$$ => * / _ s
Sandhi Rules¶
You can also condition across word boundaries, creating sandhi rules. For example, this rule implements initial consonant lenition, similar to how it works in Irish:
Class vowel {a, e, i, o, u}
lenition:
{p, t, k, b, d, ɡ} => {f, h, x, v, j, ɣ} / @vowel $$ _ @vowel
Syllables¶
Lexurgy can understand words that are broken up into syllables, using
periods (.
) to separate syllables. You can also specify syllabification
rules and have Lexurgy automatically break up words into syllables.
This capability is demonstrated by the
Syllabian example.
Enabling Syllables¶
To enable syllable-based processing, you have to provide a Syllables:
declaration. Take this sound change:
Class vowel {a, e, i, o, u}
intervocalic-voicing:
{p, t, k} => {b, d, ɡ} / @vowel _ @vowel
This turns e.g. kipo
into kibo
. But
if we put syllable breaks in (ki.po
), the rule stops
working, because the [p] is no longer between two vowels:
it’s between a .
and a vowel. To fix this, add explicit syllables:
Class vowel {a, e, i, o, u}
Syllables:
explicit
intervocalic-voicing:
{p, t, k} => {b, d, ɡ} / @vowel _ @vowel
Now Lexurgy will treat all .
characters in the input as syllable
breaks. It sees the [p] in ki.po
as directly following the
[i] and correctly produces ki.bo
.
If you want to get rid of syllable breaks once you’re done with them, clear syllables:
Class vowel {a, e, i, o, u}
Syllables:
explicit
intervocalic-voicing:
{p, t, k} => {b, d, ɡ} / @vowel _ @vowel
Syllables:
clear
This turns ki.po
into kibo
.
Matching Syllable Breaks¶
You can use the .
character in rules to match syllable boundaries.
For example, it’s common for a rule to affect only coda consonants.
This rule nasalizes vowels before a coda nasal:
Feature +nasalized
Diacritic ̃ (floating) [+nasalized]
Class vowel {a, e, i, o, u}
Class nasal {m, n}
Syllables:
explicit
nasalization:
@vowel => [+nasalized] / _ @nasal .
This turns ban.ta
into bãn.ta
and ton
into tõn
, but
leaves ba.na.na
unchanged. Note that the .
character matches
word boundaries too, not just syllable breaks within a word.
Matching Entire Syllables¶
If you want to match an entire syllable, use <syl>
in the rule.
For example, this rule converts words into tallies of their syllables:
Syllables:
explicit
tally-syllables:
<syl> => I
Syllables:
clear
This converts ki.po
into II
and ba.na.na
into III
.
Syllabification Rules¶
The above assumes that you’ve manually put syllable breaks in the input. But you can have Lexurgy do this for you by specifying what syllable shapes are allowed, using the same syntax as any other rule. Here’s a syllable declaration for a language with strictly open syllables and no clusters:
Class consonant {p, t, k, s, m, n, l}
Class vowel {a, e, i, o, u}
Syllables:
@consonant? @vowel
The pattern @consonant? @vowel
describes the valid syllable
shapes: an optional consonant followed by a mandatory vowel.
This will automatically break up kamina
as ka.mi.na
and ekipoa
as e.ki.po.a
. But if you try to pass in a
word like kantu
that violates the syllable structure,
you’ll get an error.
Here’s a more complicated example that allows certain coda consonants:
Class consonant {p, t, k, s, m, n, l}
Class vowel {a, e, i, o, u}
Syllables:
@consonant? @vowel {m, n, l}?
This will still automatically break up kamina
as ka.mi.na
and ekipoa
as e.ki.po.a
, but it will also allow
words like kantu
and pasel
, breaking them up as kan.tu
and pa.sel
.
This example also illustrates that, when faced with multiple possible
ways of breaking a word into syllables, Lexurgy will always put the
syllable breaks as early as possible. Notice that kamina
could equally well
be broken up as kam.in.a
, since kam
, in
, and a
are
all valid syllables. But breaking it up as ka.mi.na
puts the
syllable breaks earlier.
Resyllabification¶
Syllabification rules are automatically reapplied after every named sound change. Take the example with codas again, but this time let’s add a sound change:
Class consonant {p, t, k, s, m, n, l}
Class vowel {a, e, i, o, u}
Syllables:
@consonant? @vowel {m, n, l}?
drop-stop-after-nasal:
{p, t, k} => * / {m, n} _
A word like kantu
gets syllabified into kan.tu
, and then the
sound change deletes the t
, leaving kan.u
. But after the rule
finishes, the syllabification rule applies again, resulting in ka.nu
.
Changing Syllabification Rules¶
However, sometimes sound changes affect the syllable structure. Let’s add a vowel-deletion change to the language with strictly open syllables:
Class consonant {p, t, k, s, m, n, l}
Class vowel {a, e, i, o, u}
Syllables:
@consonant? @vowel
syncope:
@vowel => * / $ <syl> @consonant _ @consonant
This rule deletes the vowel in the second syllable if it’s
between two consonants. But if you try to apply it to a word like
kamina
, you’ll get an error; the new word, kamna
, now
has an illegal coda consonant!
To solve this, add a new syllable declaration after the syncope rule:
Class consonant {p, t, k, s, m, n, l}
Class vowel {a, e, i, o, u}
Syllables:
@consonant? @vowel
syncope:
@vowel => * / $ <syl> @consonant _ @consonant
Syllables:
@consonant? @vowel @consonant?
Now the words kamina
and ekipoa
correctly become
kam.na
and ek.po.a
, with the syllabification adjusted
to the new structure.
You can also stop resyllabification completely with special syllable declarations. This stops automatic resyllabification but leaves any existing syllable breaks where they are:
Syllables:
explicit
This removes all syllable breaks:
Syllables:
clear
Syllable-Level Features¶
You can declare features and diacritics that operate on entire syllables. Here’s an example:
Feature (syllable) +stress
Diacritic ˈ (before) [+stress]
Syllables:
explicit
stress-shift:
{ɛ, ɔ}&[+stress] => {e, o}
The line Feature (syllable) +stress
declares the stress
feature to be a syllable-level feature. This feature needs a diacritic
to mark it, provided by the line Diacritic ˈ (before) [+stress]
.
The (before)
modifier means that the diacritic must go before the
beginning of the syllable, which is where the IPA puts it. (The default
location is after the end of the syllable.)
If you pass in words like ˈkɛ.tɔ
and kɛ.ˈtɔ
, which differ
only by stress location, the above changes will turn them into
ˈke.tɔ
and kɛ.ˈto
.
You can assign a syllable-level feature to a syllable in a rule by changing any sound in the syllable (or the entire syllable) to a matrix containing the feature. This rule assigns stress on the first syllable by assigning it to the first sound:
Feature (syllable) +stress
Diacritic ˈ (before) [+stress]
Syllables:
explicit
stress-first:
[] => [+stress] / $ _
You can replace the last line with <syl> => [+stress] / $ _
,
applying the rule to the entire first syllable, and the result is the same.
Use whichever strategy works best for your case.
Syllable-Level Features in Syllabification Rules¶
You can also assign syllable-level features directly in the syllabification rules. The following uses the syllabifier to distinguish “heavy” syllables (with a long vowel or coda) from “light” syllables (all others):
Feature (syllable) +heavy, +long
Diacritic ² [+heavy]
Diacritic ː (floating) [+long]
Class consonant {p, t, k, s, m, n, l}
Class vowel {a, e, i, o, u}
Syllables:
@consonant? {@vowel&[+long], @vowel @consonant} => [+heavy]
@consonant? @vowel
The first line finds syllables that end in a long vowel or a vowel
and a consonant, and applies the +heavy
feature (notated with
a ²
). The second line says that consonant-vowel syllables
are also allowed, but these aren’t +heavy
.
This syllabification rule turns kamina
into ka.mi.na
(because all its
syllables are light), but kaːtantu
into kaː².tan².tu
(because the
first two syllables are heavy).
Syllable-level features allow complex stress rules to be written compactly. The following implements the rule “stress the third-last syllable if the last two syllables are light, otherwise stress the second-last syllable”:
Feature (syllable) +heavy, (syllable) +stress, +long
Diacritic ² [+heavy]
Diacritic ˈ (before) [+stress]
Diacritic ː (floating) [+long]
Class consonant {p, t, k, s, m, n, l}
Class vowel {a, e, i, o, u}
Syllables:
@consonant? {@vowel&[+long], @vowel @consonant} => [+heavy]
@consonant? @vowel
assign-stress:
<syl> => [+stress] / _ <syl>&[-heavy] <syl>&[-heavy] $
Else:
<syl> => [+stress] / _ <syl> $
This turns kamina
into ˈka.mi.na
, stressing the first
syllable, but kaːtantu
into kaː².ˈtan².tu
, stressing the second
syllable.
Capturing Syllable Information¶
By default, captures ($1
etc.) copy only sounds, not syllable information.
Take this rule:
Feature (syllable) +stress
Diacritic ˈ (before) [+stress]
Class cons {p, t, k, s, m, n, l}
Class vowel {a, e, i, o, u}
Syllables:
explicit
vowel-swap:
@vowel$1 @cons$2 @vowel$3 => $3 $2 $1
If applied to ˈma.ku
, this produces ˈmu.ka
; the stress stays
on the first syllable instead of moving with the captured [a] to the second
syllable.
If you want to copy syllable information, you can put a period between
the $
and the capture number (e.g. $.1
). This can be useful if
you want to move around whole chunks of words:
Feature (syllable) +stress
Diacritic ˈ (before) [+stress]
Syllables:
explicit
swap-and-glom:
([]+)$1 $$ ([]+)$2 => $.2 . $.1
This will turn kan ˈko.pu
into ˈko.pu.kan
, preserving both the syllable boundaries
and stress from the original pieces. Using ordinary captures would yield kopu.ˈka.n
instead,
as the rule tried to graft the original syllable structure onto the new sounds.