Lomongo

This is an example of a language project. This project implements a linguistic generalization regarding morpho-phonology in Lomongo

This project is an example of a Transducer modeling Morpho-/Phonology.

This project would earn a grade of 1,0.

import Dfst

Description of the generalization

This project is about the interaction of phonological rules in the language Lomongo, a Bantu language spoken in the Democratic Republic of the Congo (Ethnologue entry).

In Lomongo, the second person singular subject affix usually surfaces as the prefix o-, and the third person plural subject affix usually surfaces as the prefix ba-. In case the stem they attach to begins with a vowel, they mutate: o- surfaces as w-, and ba- surfaces as b-. Only underlying vowel initial stems trigger these processes, however. There is a general process whereby voiced consonants are deleted intervocalicly. Such a process can lead to surface environments which look as though they would trigger the mutation (see the table), but do not. The data come from page 135 of the book Generative Phonology: Description and Theory (1979) by Michael Kenstowicz and Charles Kisseberth (they attribute it to Hulstaert (1957)), where it's analysis is given as an exercise.

Imper	2sg	3pl	English
sanga	osanga	basanga	say
kamba	okamba	bakamba	work
ena	wena	bena	see
isa	wisa	bisa	hide
bina	oina	baina	dance
bota	oota	baota	beget

The analysis involves three rules:

prevocalic gliding: A non-low vowel becomes a glide when in front of another vowel \[V_{[-low]} \rightarrow V_{[-syl]}\mathrel{|}\underline{\quad}V \]
vowel deletion: The first in a sequence of two vowels deletes \[V \rightarrow \emptyset\mathrel{|}\underline{\quad}V\]
intervocalic consonant deletion: A voiced consonant deletes intervocalicly \[C_{[+voi]} \rightarrow \emptyset\mathrel{|}V\underline{\quad}V\]

These rules need to be ordered wrt one another in the following way:

prevocalic gliding
vowel deletion
intervocalic consonant deletion

That both gliding and vowel deletion must happen prior to consonant deletion is given by the attested 2sg forms for bina and bota, which have a sequence of two vowels, and which therefore these two rules could apply.

That vowel deletion must happen after glide formation is given by the hypothesized forms oena and oisa underlying the attested 2sg forms for ena and isa. These provide the appropriate context for both rules, but the attested forms, wena and wisa, have undergone gliding, not deletion.

References

Kenstowicz, Michael and Charles Kisseberth (1979). Generative Phonology: Description and Theory. Academic Press, New York.

Implementation

I implement a transduction from morphemes to allophones. I define an alphabet of morphemes as the source language of the transduction.

data Lomongo = SAY | WORK | SEE | HIDE | DANCE | BEGET | TwoSG | ThreePL deriving Show

Here is a transducer which maps morphemes to their phonological interpretation.

morph :: ForwardTrans () Lomongo String
morph = mkForwardTransducer [()] [SAY,WORK,SEE,HIDE,DANCE,BEGET,TwoSG,ThreePL] ['a'..'z'] () "" d f
  where
    f () = Just ""
    d () SAY = Just ((),"sanga")
    d () WORK = Just ((),"kamba")
    d () SEE = Just ((),"ena")
    d () HIDE = Just ((),"isa")
    d () DANCE = Just ((),"bina")
    d () BEGET = Just ((),"bota")
    d () TwoSG = Just ((),"o")
    d () ThreePL = Just ((),"ba")

Before implementing the rules given above, I define natural classes of phonemes, and specify how to make a vowel non-syllabic.¹

vowel s = (s == 'a') || (s == 'e') || (s == 'i') || (s == 'o') || (s == 'u')
consonant = not . vowel
voiced s = vowel s || (s == 'b') || (s == 'd') || (s == 'g') || (s == 'z') || (s == 'm') || (s == 'n') || (s == 'w') || (s == 'j')
low s = s == 'a'
continuant s = vowel s || (s == 'n') || (s == 'm') || (s == 'z') || (s == 'w') || (s == 'j')
voicedStop s = voiced s && not (continuant s)
mkMinusSyl s = case s of
                 'o' -> 'w'
                 'i' -> 'j'
                 _ -> s

With natural classes in hand, I now implement the rules needed for the analysis as transducers. I begin by defining types for states corresponding to the environments of the rules.

data PreVocalic = PV_Zero | PV_BeforeVowel Char
data InterVocalic = IV_Zero | IV_SeenV | IV_BeforeVowel Char

The PreVocalic data type allows us to represent the environment for rules whose environment is prevocalic; i.e. those of the form \(X \rightarrow Y \mathrel{|} \underline{\quad}V\). The constructor PV_Zero represents not having seen anything relevant for this rule. The constructor PV_BeforeVowel represents having seen an X, and that we are now looking to see if the next segment will be a vowel.

The Intervocalic data type allows us to represent the environment for rules whose environment is intervocalic; i.e. those of the form \(X \rightarrow Y \mathrel{|} V\underline{\quad}V\). The constructor IV_Zero represents not having seen anything relevant for this rule. The constructor IV_SeenV represents having just seen a vowel (i.e. the first part of the context for the rule). The constructor IV_BeforeVowel represents having just seen an X, immediately following a vowel, and that we are now looking to see if the next segment will be a vowel.

gliding :: ForwardTrans PreVocalic Char String
gliding = mkForwardTransducer states ['a'..'z'] ['a'..'z'] PV_Zero "" d f
  where
    states = PV_Zero : [PV_BeforeVowel c | c <- ['a'..'z'], vowel c]
    f PV_Zero = Just ""
    f (PV_BeforeVowel c) = Just [c]
    d PV_Zero c | vowel c && not (low c) = Just ((PV_BeforeVowel c),"")
                | otherwise = Just (PV_Zero,[c])
    d (PV_BeforeVowel b) c  | vowel c =
                              if not (low c) then Just ((PV_BeforeVowel c),[mkMinusSyl b])
                              else Just (PV_Zero,[mkMinusSyl b,c])
                            | otherwise = Just (PV_Zero,[b,c])

vowelD :: ForwardTrans PreVocalic Char String
vowelD = mkForwardTransducer states ['a'..'z'] ['a'..'z'] PV_Zero "" d f
  where
    states = PV_Zero : [PV_BeforeVowel c | c <- ['a'..'z'], vowel c]
    f PV_Zero = Just ""
    f (PV_BeforeVowel c) = Just [c]
    d PV_Zero c | vowel c = Just (PV_BeforeVowel c,"")
                | otherwise = Just (PV_Zero,[c])
    d (PV_BeforeVowel b) c | vowel c = Just (PV_BeforeVowel c,"")
                           | otherwise = Just (PV_Zero,[b,c])

consD :: ForwardTrans InterVocalic Char String
consD = mkForwardTransducer states ['a'..'z'] ['a'..'z'] IV_Zero "" d f
  where
    states = (IV_Zero : IV_SeenV : [IV_BeforeVowel c | c <- ['a'..'z'], voicedStop c])
    f (IV_BeforeVowel c) = Just [c]
    f _ = Just ""
    d IV_Zero c | vowel c = Just (IV_SeenV,[c])
                | otherwise = Just (IV_Zero,[c])
    d IV_SeenV c | vowel c = Just (IV_SeenV,[c])
                 | voicedStop c = Just (IV_BeforeVowel c,"")
                 | otherwise = Just (IV_Zero,[c])
    d (IV_BeforeVowel b) c | vowel c = Just (IV_SeenV,[c])
                           | otherwise = Just (IV_Zero,[b,c])

The rule ordering will be implemented via transducer composition.

lomongo = do
  mg <- morph `compose` gliding
  mgv <- mg `compose` vowelD
  mgv `compose` consD

Verification

To test our implementation, we define a function which attempts to use a machine to recognize each word in a list of words, each time reporting on the result.

test :: (Show a,Show b,Eq b) => SeqTrans q a [b] dir -> [([a],[b])] -> IO ()
test t ws = sequence_ $ fmap attempt ws
  where
    attempt (i,o) = putStrLn (show i ++ "\t\t" ++ show (t `transduce` i) ++ "\t\t (should be: " ++ show o ++ ")")

The program can be tested using the following function.

main :: IO ()
main = do 
  sequence_ $ fmap (\t -> test t testData) lomongo
  return ()
    where
      testData = [([TwoSG,SAY],"osanga")
                 , ([ThreePL,SAY],"basanga")
                 , ([TwoSG,WORK],"okamba")
                 , ([ThreePL,WORK],"bakamba")
                 , ([TwoSG,SEE],"wena")
                 , ([ThreePL,SEE],"bena")
                 , ([TwoSG,HIDE],"wisa")
                 , ([ThreePL,HIDE],"bisa")
                 , ([TwoSG,DANCE],"oina")
                 , ([ThreePL,DANCE],"baina")
                 , ([TwoSG,BEGET],"oota")
                 , ([ThreePL,BEGET],"baota")
                 ]

The output should look like the following:

*Main> main
[TwoSG,SAY]             Just "osanga"            (should be: "osanga")
[ThreePL,SAY]           Just "basanga"           (should be: "basanga")
[TwoSG,WORK]            Just "okamba"            (should be: "okamba")
[ThreePL,WORK]          Just "bakamba"           (should be: "bakamba")
[TwoSG,SEE]             Just "wena"              (should be: "wena")
[ThreePL,SEE]           Just "bena"              (should be: "bena")
[TwoSG,HIDE]            Just "wisa"              (should be: "wisa")
[ThreePL,HIDE]          Just "bisa"              (should be: "bisa")
[TwoSG,DANCE]           Just "oina"              (should be: "oina")
[ThreePL,DANCE]         Just "baina"             (should be: "baina")
[TwoSG,BEGET]           Just "oota"              (should be: "oota")
[ThreePL,BEGET]         Just "baota"             (should be: "baota")

Footnotes:

This essentially implements a feature system over the atomic phonemes. Often, we begin with features, and define the phonemes in terms of them. I could have done this here as well, but thought I would stay instead with simple characters.