Context Free Grammars

While phonology and morphology are about interpreting structures (phonology interprets underlying forms as surface forms, and morphology complex heads as words), the role of syntax is that of circumscribing the class of well-formed structures.

Syntactic structures are often thought of as trees, although often more complicated structures violating the single mother condition are used. Perhaps the simplest way of specifying the well-formedness of a tree is locally: a tree is well-formed just in case all of its parts are.

Figure 1: An example tree

Figure 1: An example tree

Consider the tree in figure fig:CFG.Tree1, which represents the surface syntactic structure of the sentence “Every monkey ate a banana quickly.” What are its local parts? Viewing a tree as a special kind of directed graph (one where no more than one edge enters any node, for example), a tree consists of immediate connections between nodes: mother-daughter relationships. In the tree in the figure, the node S is the mother of the two nodes NP and VP, in that order. We typically write this in a ‘rule’-like notation: \(S \rightarrow NP
VP\). Such a rule is sometimes called a ‘production.’ The mother-daughter relationships exhibited by the tree in figure fig:CFG.Tree1 are given as productions below.

\(S\rightarrow NP\ VP\) \(NP \rightarrow Det\ N\)
\(VP \rightarrow VP\ AdvP\) \(Det \rightarrow every\)
\(VP \rightarrow V\ NP\) \(Det \rightarrow a\)
\(AdvP \rightarrow Adv\) \(N \rightarrow banana\)
\(Adv \rightarrow quickly\) \(N \rightarrow monkey\)

Every tree gives rise to a set of productions (those which encode its mother-daughter relationships). However, different trees can give rise to the same set of productions. The tree in figure fig:CFG.Tree2 engenders the same set of productions as the one in figure fig:CFG.Tree1.

Figure 2: A tree with the same mother-daughter relationships as in figure fig:CFG.Tree1

Figure 2: A tree with the same mother-daughter relationships as in figure fig:CFG.Tree1

A production can be viewed as a permissible mother-daughter relationship, and a set of productions as a set of allowable local tree configurations. A tree is licensed by a set of productions just in case all of its mother-daughter relationships are allowed. We are typically interested not just in any old tree licensed by a set of productions, but rather in those which represent complete sentences. Furthermore, we typically make a distinction between terminal symbols (such as letters in words) and non-terminal symbols (such as category names). We require that terminal symbols can never have daughters (i.e. that terminal symbols can never appear on the left hand side of any production.) A context-free grammar consists of a set of productions together with a designated ‘start’ node label, and a demarcation of terminal and non–terminal symbols.

We recall the definition of trees, and folding over trees.

> data Tree a = Node {root :: a, subForest :: [Tree a]}
> foldTree combine = evalTree
>   where
>     evalTree (Node a ts) = a `combine` map evalTree ts

We define a production to consist of a non-terminal label together
with a sequence of terminals and non-terminals.

> infixl 6 :-
> data Production n t = n :- [Either n t] deriving Eq

A context-free grammar consists of a designated non-terminal symbol
(the start category) and a list of productions.

> data CFG n t = CFG n [Production n t]

A 'CFG n t' will accept a tree with node labels of the form 'Left n'
or 'Right t'.  To verify whether a tree is accepted by a CFG, we check
whether each mother-daughters configuration is licensed by a
production, starting at the root.

> acceptTD :: (Eq n,Eq t) => CFG n t -> Tree (Either n t) -> Bool
> acceptTD (CFG start prods) t = (root t == Left start) && go t
>   where
>     go (Node (Left a) ts) | (a :- map root ts) `elem` prods = all go ts
>                           | otherwise = False
>     go (Node (Right b) ts) = null ts

We can also proceed in a bottom-up manner.  We move upwards in the
tree, from leaves to root, checking at each level whether the
mother-daughter relations are licensed by the productions.  Each
subtree is evaluated to a pair consisting of its root label, and a
boolean value expressing whether all of its sub-parts were
well-formed.

> acceptBU :: (Eq n,Eq t) => CFG n t -> Tree (Either n t) -> Bool
> acceptBU (CFG start prods) = (== (Left start,True)) . foldTree check
>  where
>    check n@(Left node) daughters = let (dhts,checked) = unzip daughters
>                                  in
>                                    (n,and checked
>                                       && ((node :- dhts) `elem` prods))
>    check n@(Right node) daughters = (n,null daughters)