Computerlinguistik
WS 2019/20

Module Computerlinguistik (04-006-1006)
Times Mo: 1115-1245 (HS 20)
  Mi: 1315-1445 (HS 20)
Instructor Greg Kobele (GWZ H1 5.11)
Sprechzeiten Di: 14:30-15:30
Tutoren Anton Hampe
  Oskar Nuernbergk
Tutoria Mi: 1515-1645 (S410)
  Do: 1715-1845 (S410)
  Moodle

Example Projects

Some Tree Transducer Examples

Examples (html | hs)

A transducer over strings (Morpho-phonology)

An acceptor over strings (Phonotactics)

Code Snippets

Tree Transducers

The code for tree transducers of various sorts is here (html | hs)

Transducers

The code for sequential transducers is here (html | hs)

Automata

The code for deterministic finite state automata is here (html | hs)

TSL

The code for TSL languages is here (html | hs)

Announcements

Tutorium

  • Informative Slides zum Tutorium
  • Bitte um Zeitangaben! Doodle-Poll bis Freitag ausfüllen bitte!

Literatur

Course Log

<2020-01-13 Mon>

We thought about how to extend our notion of strict locality from strings to trees.

We first characterized strings as a unary branching tree:

stringToTree :: a -> Tree a
stringToTree [] = undefined -- There is no empty tree, so there is no
                            -- tree which corresponds to the empty
                            -- string
stringToTree [x] = Node x []
stringToTree (x:xs) = Node x [stringToTree xs]

unaryTreeToString (Node x []) = [x]
unaryTreeToString (Node x [xs]) = x : unaryTreeToString xs

A 2-SL characterization of a string restricts neighbors (i.e. which segment may follow an immediately preceding one). Viewing a string as a unary tree, we see that it restricts parents and their children. We decided to extend 2-strict locality to trees by taking a tree 2-factor to be a mother paired with a possible daughter sequence.

Given a tree, we can collect its k-factors.

prune 0 (Node a _) = Node a []
prune n (Node a ts) = Node a $ fmap (prune (n-1)) ts

extractKFactors k t = prune k t : concat (fmap (prune k) $ daughters t)

Verifying that a tree is grammatical with respect to a k-SL grammar is just verifying that its k-factors are permitted by the grammar.

type TreeFactor a = Tree a
type TreeSL a = [TreeFactor a]

subseteq :: Eq a => [a] -> [a] -> Bool
xs `subseteq` ys = all (`elem` ys) xs

maxInt :: Num a => [a] -> a
maxInt = foldr max 0

depth :: Num b => Tree a -> b
depth (Node _ ts) = 1 + maxInt $ fmap depth ts

isGrammatical :: Eq a => TreeSL a -> Tree a -> Bool
isGrammatical g = go
  where
    k = maxInt $ fmap depth g
    go t = extractKFactors k t `subseteq` g

<2020-01-08 Wed>

We continued discussing trees.

We noted that trees, just like lists, contained data, and that we might like to extract this data from a given tree. In the case of lists, the data is presented in a fixed order, and so it always makes sense to ask for the 'next' datum. In the case of trees, however, while the current datum is at the root, it is non-obvious what the 'next' one should be. A strategy for serializing the nodes of a tree answers this question, and is called a (tree-)traversal. We can view such a strategy as a function mapping a tree to a list of its nodes (the list being the answer to the question: in what order do we see the nodes of the tree). If we know nothing about the kind of data the tree stores, we must serialize the tree based purely on its geometry. There are two well-known strategies for doing this:

depth first
serialize the daughter subtrees completely before moving on to the next one (earlier daughters are serialized before later daughters)
breadth first
serialize all nodes at each level of the tree before moving on to the next level

A depth first traversal looks much like the yield function:

depthFirst :: Tree a -> [a]
depthFirst (Node a ts) = a : concat (depthFirst <$> ts)

Here the root (a) of the tree is placed at the front of the list, and the daughters of the root are serialized in the order in which they occur.

A breadth first traversal makes use of an auxiliary function which serializes the nodes in a list of trees in a breadth first way.

breadthFirst t = bf [t]
  where
    bf [] = []
    bf (Node a ts : tts) = a : bf (tts ++ ts)

Here the auxiliary function bf maintains a queue of trees to serialize. At each step, it picks the tree at the front of the queue, places its root at the front of the serialized list, and its daughters are placed at the end of the queue.

We posed the question: how can we enumerate all sentences over a given alphabet? As each non-empty sentence is the result of prefixing a letter onto another sentence, we have a simple way of growing a tree containing all sentences:

wordTree alphabet = iter prefix []
  where
    prefix ws = (: ws) <$> alphabet

The tree wordTree "ab" contains all words over the alphabet \(\Sigma = \{a,b\}\), with the empty string at the root, and the daughters of a node being one letter longer than their parent.

Enumerating the words of an alphabet in order of increasing length is then just a matter of traversing this tree in a breadth-first manner.

allWords = breadthFirst . wordTree

<2020-01-06 Mon>

Today we introduced trees as inductive data types.

data Tree a = Node a [Tree a]

In English: a Tree with nodes labeled with objects of type a, consists of a Node labeled with some a, together with a list of Trees with node labels from a.

A given node consists of a parent (the label of that node) and its children (its list of trees).

parent (Node p _) = p
children (Node _ ts) = ts

A leaf is a node without children.

isLeaf = null . children
-- > isLeaf (Node _ []) = True
-- > isLeaf (Node _ (_:_)) = False

Recursive functions over trees often involve mapping themselves over the list of children. Adding all numbers in a tree of integers is an example.

sumTree :: Tree Int -> Int
sumTree (Node i ts) = i + sum (sumTree <$> ts)
-- (<$>) is an infix notation for the function 'map'

-- recall that
-- > sum = foldr (+) 0
-- > map f = foldr ((:) . f) []
-- and that
-- > foldr f e . map g == foldr (f . g) e
-- so therefore
-- > sum (sumTree <$> ts) == foldr ((+) . sumTree) 0

As foldr encapsulates a basic recursion scheme over lists, we can use it to write a tree version of itself (and see sumTree as a special case).

foldTree :: (a -> b -> b) -> b -> Tree a -> b
foldTree cons nil (Node a ts) = a `cons` foldr cons nil (foldTree cons nil <$> ts)

-- To avoid clutter, we could write the above as per the below 
--
-- > foldTree cons nil = goTree
-- >  where
-- >    go = foldr cons nil
-- >    goTree (Node a ts) = a `cons` go (goTree <$> ts)
--
-- We can then deforest (combine the ~foldr~ and the ~map~) to obtain
--
-- > foldTree cons nil = goTree
-- >  where
-- >    goTree (Node a ts) = a `cons` go ts
-- >    go = foldr (cons . goTree) nil

sumTree = foldTree (+) 0

We can define the yield of a tree (the sequence of leaves read from left to right).

-- The yield of a leaf is itself (as a sequence of length 1)
yield (Node a []) = [a]
-- The yield of a non-leaf is the concatenation of the yields of its daughters
yield (Node _ ts) = concat (yield <$> ts)

Trees in linguistics are often viewed as representing the structure (whatever that means) of an expression. Outside of linguistics, trees are commonly used to represent (the history of) a process. For example, a family tree represents the history of a reproductive process, with the parent being a literal parent, and its children being their literal children. In general, we need a next function, which assigns to an individual a list of outcomes; in the case of family trees, the next function assigns to an individual the list of their children (say, ordered according to age). Given such a function together with a starting point, we can create a tree whose root is that starting point, and whose children is the result of iterating this process on the list of outcomes of this function on the starting point.

iter :: (a -> [a]) -> a -> Tree a
iter next start = Node start (iter next <$> next start)

Consider a tree of integers where the integer associated with any given node determines the number of its identical children, and where the integer value of a parent is one greater than the (equal) integer values of the roots of its children. In this tree, all leaves will have the value 0, the parent of a leaf will have a value of 1, and the leaf will be an only child. The parent of a parent of a leaf will have a value of 2, and it will have two children (with value 1) and so on. Such a tree can be defined using iter as follows.

tV :: Int -> Tree Int
tV = iter var
  where
    var n = Node n $ take n $ repeat $ tV (n - 1)

-- > tV 0 == Node 0 []
-- > tV 1 == Node 1 [tV 0]
-- > tV 2 == Node 2 [tV 1, tV 1]
-- > tV 3 == Node 3 [tV 2, tV 2, tV 2]

We can ask, what is the sum of the nodes in a tree of the form tV k? We can begin by writting a simple function that creates a tree of this type and then computes its sum, and running it on the first 10 integers.

variations :: Int -> Int
variations = sumTree . tV

-- > variations <$> [1..10]
-- > [1,4,15,64,325,1956,13699,109600,986409,9864100]
-- found in the OEIS: https://oeis.org/A007526

A little manipulation allows us to eliminate the intermediate tree, and obtain a numerical function:

{-
> variations 0 
> == sumTree (tV 0)
> == sumTree (Node 0 [])
> == 0
-}
variations 0 = 0
{-
> variations (n+1)
> == sumTree (tV (n+1))
> == sumTree (Node (n+1) $ take (n+1) $ repeat $ tV n)
> == (n+1) + sum $ map sumTree $ take (n+1) $ repeat $ tV n
> == (n+1) + sum $ take (n+1) $ map sumTree $ repeat $ tV n
> == (n+1) + sum $ take (n+1) $ repeat $ sumTree $ tV n
> == (n+1) + sum $ take (n+1) $ repeat $ variations n
> == (n+1) + (n+1)*(variations n)
> == (n+1) * (1 + variations n)
-}
variations n = n * (1 + variations (n-1))

<2019-12-18 Wed>

We discussed general finite state transducers, Haskell, and how the course was going. Merry Christmas!

<2019-12-16 Mon>

We introduced tier-based strictly local transductions, on analogy with TSL automata. We used them to model the Slovenian and Georgian examples which we had previously used the TSL automata to model the phonotactics of. We saw that in TSL transductions the correspondance simultaneous = ISL and sequential = OSL no longer holds, intuitively because the notion of locality in ITSL transductions has changed: an off-tier segment is local to the last on-tier segment encountered.

<2019-12-11 Wed>

We mirrored our previous foray into the strictly piecewise languages in the strictly piecewise transducers. An (2-)ISP transduction is one where the states represent sets of previously seen input symbols, and a transduction reading a takes you to a state where a is added to the set of symbols in the source state. An (2-)OSP transduction is one where the states represent sets of previously seen output symbols, and a transduction outputting a takes you to a state where a is added to the set of symbols in the source state. Somewhat surprisingly, ISP transductions are not useful for describing bidirectional long distance (left-to-right) harmony processes such as sibilant harmony, where s and S (esh) cannot cooccur, and all subsequent sibilants harmonize with the first sibilant in a string. This is because once you have seen a second (different) sibilant, your state contains both kind of triggers, and you can no longer distinguish which was first. In contrast, OSP transductions are able to describe such a process.

<2019-12-09 Mon>

We discussed how to combine ISL and OSL transductions. At the level of machines, a (i,j)-IOSL transition from state (α,u) reading a and outputting v takes you to a state (takeEnd i αa,takeEnd j uv). We saw that this sort of machine could represent iterative rules with two-sided contexts.

We also discussed how to incorporate transducers in an account of our language use. We saw that production (how to map an underlying form to a surface form) was trivial, as our machines are set up to do this deterministically. However, an account of perception (how to reconstruct an underlying form from a surface form) required more from us. We saw how to set up a search tree (which we called poetically a 'Tree of Possibilities') which represented all ways of moving through our machine while producing the desired output. A node in a search tree was a triple of the form (state,inputConsumed,outputRemaining), which represented

  1. the state of the machine we are in
  2. the input it took to move us there
  3. the output we still need to account for

The daughters of such a node in the search tree are determined by the outgoing transitions of the machine at that state.

<2019-12-04 Wed>

We observed that progressive nasal spreading (as found in Malay) was not ISL. The problem was that the trigger (the nasal) was not local to the segment(s) that we wanted to change (the vowels/semivowels). This was somewhat surprising, as intuitively the nasal trigger is local to the next segment to change, as it is spreading from one segment to the next. We realized that this locality is at the level of the output, and formulated OSL (output strictly local) transducers as those whose next state is determined by the previously written (outputted) symbols.

handout
Example Project

<2019-12-02 Mon> No class: Dies Academicus

<2019-11-27 Wed>

We introduced finite state transducers, by adding outputs to transitions (and final states). We continued working with SL transducers (where each state represents a factor, and a transition reading y connects a state "xα" to "αy"), and practiced writing ISL transducers for various phonological phenomena.

<2019-11-25 Mon>

We discussed a state-merging learning algorithm for SL languages, using finite state machines as our grammars, rather than sets of factors.

<2019-11-20 Wed> No class: Buß- und Bettag

<2019-11-18 Mon>

We discussed how a SL/SP/TSL grammar could be represented as a finite state machine, with various interpretations of the states:

SL
A state represents the last k-1 segments seen
SP
A state represents the set of k-1 segment sequences seen to date
TSL
A state represents the last k-1 tier relevant segments seen

We saw that FSMs could also (and quite simply) define languages (like the language where words must have a number of vowels evenly divisible by 7) which seemed unlike anything found in human languages.

<2019-11-13 Wed>

We discussed the foldr function, which encapsulates the following sort of recursive definition:

myFunction [] = nil
myFunction (x:xs) = x `cons` myFunction xs

Using foldr, we can define the above function in the following way:

myFunction = foldr cons nil

Very many functions over lists take this form…

length [] = 0
length (x:xs) = 1 + length xs
-- foldr (\x xs -> 1 + xs) 0

sum [] = 0
sum (x:xs) = x + sum xs
-- foldr (+) 0

product [] = 1
product (x:xs) = x * product xs
-- foldr (*) 1

map f [] = []
map f (x:xs) = f x : map f xs
-- foldr (\x xs -> f x : xs) []
-- == foldr ((:) . f) []

all p [] = True
all p (x:xs) = p x && all p xs
-- foldr (\x xs -> p x && xs) True
-- == foldr ((&&) . p) True

filter p [] = []
filter p (x:xs) = if p x
                  then x : filter p xs
                  else filter p xs
-- foldr (\x -> if p x then (:) x else id) []

zipWith f [] = \l2 -> []
zipWith f (x:xs) = \l2 -> if null l2 then [] else f x (head l2) : zipWith f xs (tail l2)
-- foldr (\x xs l2 -> case l2 of {[] -> []; (y:ys) -> f x y : xs ys}) (const [])


-- append is usually defined as per the following:
append [] ys = ys
append (x:xs) ys = x : append xs ys
-- If we reverse the order of the arguments like so:
appendFlip ys [] = ys
appendFlip ys (x:xs) = x : appendFlip ys xs
-- Then we can define this using a fold:
-- flip (foldr (:))
-- We might keep the order of the arguments, but not take the second list:
appendWait [] = \ys -> ys -- == id
appendWait (x:xs) = \ys -> x : appendWait xs ys -- == (:) x . appendWait xs
-- This can be defined using a fold:
-- foldr (\x xs -> (:) x . xs) id

<2019-11-11 Mon>

We discussed learning paradigms in general, and worked through a learning algorithm for k-TSL languages (for fixed k).

Code

The code for TSL languages is here (html | hs)

Reading

<2019-11-06 Wed>

We spoke about combining SP and SL perspectives on phonotactics by representing grammatical knowledge as consisting of both an SL grammar and an SP grammar. Learning is done by running both SL and SP algorithms independently over the same data, parsing is done by running both SL and SP parsers and accepting if both do, etc.

We discussed blocking effects, and how SP could not account for them. We observed that, if we ignored all segments but the ones which were interacting over long distances, we could use an SL grammar for these long distance dependencies as well. We called this sort of grammar TSL, and it involves specifying an SL grammar, as well as a set of segments to ignore when computing locality.

<2019-11-04 Mon>

We introduced SP (strict precedence) grammars. In an SP grammar, a factor ab bans (or permits) an a preceding a b. As with the SL grammars, learning, perception and production depend on a way of decomposing a string into its factors (now with this new meaning).

import Data.List (tails,subsequences)

-- The first idea is to take the first letter, and add it to the rest of the string,
-- and then to repeat this for each successive sublist. 
---- note that ((:) w . pure) x == w : [x] == [w,x]
pFacTwo [] = []
pFacTwo (w:ws) = combine w ws ++ pFacTwo ws
  where
    combine w = fmap ((:) w . pure)

-- The second idea is to first associate with each letter in the word the list of letters we would like to pair it with.
---- note that (fmap . (:)) w ws == fmap ((:) w) ws == fmap (\x -> w:x) ws
pFacTwo' [] = []
pFacTwo' word = concat $ zipWith (fmap . (:)) word $ tail $ tails $ fmap pure word

-- Finally, the Data.List module contains the function subsequences, which gives us all factors of a list:
---- subsequences [1,2,3] == [[],[1],[2],[1,2],[3],[1,3],[2,3],[1,2,3]]
-- If we just want the 2-factors, we simply discard anything of a different length
pFacTwo'' = filter ((==) 2 . length) . subsequences

<2019-10-30 Wed>

We discussed an alternative representation for SL-k grammars, in terms of a graph: each node of the graph corresponds to a possible 2-factor, and edges in the graph are labeled with segments. An edge labeld x connects two nodes ab and cd just in case b = c and x = d.

We observed that, taking words (or morphemes) as the symbols over which our factors are stated gives rise to syntactic grammars. We noted that we could better and better approximate true grammaticality by increasing the size of our factors.

We saw that all SL-k languages had a certain property, which we might call k-suffix closure:

A stringset L is SL-k iff whenever there is a string x of length k − 1 and strings u1, v1, u2, and v2, such that

1. u₁ x v₁ ∈ L
2. u₂ x v₂ ∈ L

then it will also be the case that
3. u₁ x v₂ ∈ L

In other words, long distance dependencies could only be enforced if they were separated by less than k words.

This allowed us to prove that English was not strictly local, for any choice of k, by noting that there were sentences where the distance between two dependent elements grew without bound:

  1. Which monkey did they say (that they said)ᵏ likes bananas
  2. Which monkeys did they say (that they said)ᵏ like bananas

but never:

  1. Which monkey did they say (that they said)ᵏ like bananas

<2019-10-28 Mon>

Building on the breaking of a string into its 2-factors:

data Delimited a = Start | Mid a | End deriving (Eq,Ord)

delim :: [a] -> [Delimited a]
delim word = [Start] ++ fmap Mid word ++ [End]

undelim :: [Delimited a] -> [a]
undelim = foldr (\c -> case c of {Mid a -> (:) a; _ -> id}) []

twoFac = tF . delim
  where
    tF s = zipWith (:) s $ fmap pure (tail s)

We discussed how to implement the 'three pillars of linguistics':

  1. learning

    learnTwoFac :: [[a]] -> [[Delimited a]]
    learnTwoFac txt = concat (fmap twoFac) txt
    -- learnTwoFac = concat . fmap twoFac
    
  2. understanding

    isWellFormedTwoFac :: [[Delimited a]] -> [a] -> Bool
    isWellFormedTwoFac grammar word = twoFac word `subset` grammar
      where
        subset u v = all (\x -> x `elem` v) u
    
  3. speaking

    generateTwoFac grammar = gen Start []
      where
        gen symbol word = do
          newSymbol <- nexts symbol
          case newSymbol of
            End -> return (reverse word)
            _ -> gen newSymbol (newSymbol : word)
        nexts c = [x | [u,x] <- grammar, u == c]
    

Especially in production ('speaking'), there are many important influences on what we do that are not modeled by the grammar. The generateTwoFac function given above, abstracts away from all of these, and provides us simply with a list of the possible forms given the grammar, and leaves it to us to determine which of these we actually produce. One canonical way of modeling these unknown influences is by weighting each 2-factor according to how likely we are to use it, and then deriving a weight over each possible string as per how heavy its factors are. If the weights satisfy certain principles, they constitute a probability distribution, and we can select, at random, a string based on the underlying probability distribution.

It is possible to generalize our factor extraction algorithm to work for factors of any particular length:

-- This generalizes our first algorithm (twoFac1); see in particular
-- the locally defined function fack and compare this with twoFac1.
fac1 k = fack . delim
  where
    fack word = if length word < k then []
                else take k word : fack (tail word)

-- This function implements a variant of the first algorithm.  Instead
-- of doing the recursion itself, it exploits the function tails (from
-- module Data.List).
fac1' k = filter (not . null) . fmap (take k) . tails . delim

-- This generalizes our second algorithm (twoFac2), which is the
-- special case where the input to zipAll is the list [l,tail l].  In
-- this special case, zipAll unrolls to:
---- zipWith (:) l $ zipAll [tail l]
---- == zipWith (:) l $ zipWith (:) (tail l) $ repeat []
---- == zipWith (:) l $ fmap pure (tail l)
fac2 k = zipAll . take k . tails . delim
  where
    zipAll = foldr (zipWith (:)) (repeat [])
--    zipAll [] = repeat []
--    zipAll (l:ls) = zipWith (:) l $ zipAll ls

<2019-10-23 Wed>

We discussed two ways of breaking up a string into its 2-factors in Haskell.

twoFac1 s = if length s < 2 then []
            else take 2 s : twoFac (tail s)

twoFac2 s = zipWith (:) s $ fmap pure (tail s)

<2019-10-21 Mon>

We discussed strictly local characterizations of phonotactic well-formedness.

  • A strictly k-local grammar is a finite set of sequences of segments of length up to k

We explored how to

  • learn a strictly k-local grammar from text
  • produce a grammatical sentence from a strictly k-local grammar
  • parse a sentence using a strictly k-local grammar

<2019-10-16 Wed>

notes

Today we went over familiar set theoretic notions, before presenting two formal descriptions of strings.

  1. position based
  2. first vs rest

We discussed how to translate the second representation into the first.

Homework

  1. represent the string "greg" in both ways.
  2. think about how to translate the first representation scheme into the second.

<2019-10-14 Mon>

Homework

  1. Enter your preferences into the Tutorium's (Tutorii?) Doodle poll.
  2. Install Haskell, and start ghci. You should see something like the following.

    GHCi, version 8.8.1: http://www.haskell.org/ghc/  :? for help
    Prelude>
    

    press :q to exit.

  3. Read the excerpt from the Sipser book (chapter 0), and do exercises 0.1 to 0.6. Most important are sections 0.1 to 0.2 (i.e. up to page 17).

Author: Greg Kobele

Created: 2020-04-05 Sun 23:28

Validate