Using finite automata

Finite state automata allow us to represent an infinite collection of grammatical forms in a finite way.

In addition, they are a foundational grammatical model, and play an important role in the field of formal language theory. There are a number of resources for learning more about this subject, beginning with the Wikipedia entry:

There are a number of different ways to represent finite automata. In the introductory computer science literature, you often find definitions of finite automata of the following form:

A (non-deterministic) finite automaton is a 5-tuple \(\langle Q,\Sigma,I,\Delta,F\rangle\), where

  • \(Q\) is a finite set of states
  • \(\Sigma\) is a finite set of alphabet symbols
  • \(I\) is a subset of \(Q\), the initial states
  • \(F\) is a subset of \(Q\), the final states
  • \(\Delta\), the transition relation, a finite subset of \( Q \times (\Sigma \cup \{\epsilon\}) \times {Q}\), relates input states and alphabet symbols (or possibly the empty string) to output states

It is possible to implement this definition as a Haskell data type (representing here sets as lists).

data NFA q sigma =
  NFA
  [q] -- the set of states
  [sigma] -- the set of alphabet symbols
  [q] -- the set of initial states
  [(q,sigma,q)] -- the transition relation
  [q] -- the set of final states

We saw last time, however, that we could represent everything as a function by trading in sets for boolean valued functions.

data NFA q sigma =
  NFA
  (q -> Bool) -- the used states
  (sigma -> Bool) -- the used alphabet symbols
  (q -> Bool) -- initial states
  (q -> sigma -> q -> Bool) -- transitions
  (q -> Bool) -- final states

We can also mix and match functional and set-based representations. Which definition is most useful depends on our goals. Let us for the time being adopt the following mixed representation, which will be congenial for recognizing words.

data NFA q sigma =
  NFA
  [q] -- initial states
  (q -> Maybe sigma -> [q]) -- transitions
  (q -> Bool) -- final states

We have kept the initial state set a set, and the transitions are functions to a set of states, while the final states are represented as a function. This is because it is convenient to be able to have a concrete set of initial and transited-to states to manipulate (to give to the transition function, for example, or to check whether they are final), while with final states we only need to be able to decide whether or not a particular state is final. We have also discarded the set of all states and the set of used alphabet symbols from this representation because we won’t need them for now.

Somewhat unsurprisingly, the main difficulty in using non-deterministic automata to recognize words comes from the non-determinism about which state you are currently in. A simpler and easier to use model is the deterministic finite state automaton, which in a textbook might be defined as per the below.

A deterministic finite automaton is a 5-tuple \(\langle Q,\Sigma,q_{0},\delta,F\rangle\), where

  • \(Q\) is a finite set of states
  • \(\Sigma\) is a finite set of alphabet symbols
  • \(q_{0}\) is an element of \(Q\), the initial state
  • \(F\) is a subset of \(Q\), the final states
  • \(\delta\), the transition function, a partial function mapping pairs of input states and alphabet symbols to output states.

The three differences between deterministic and non-deterministic automata are:

  1. deterministic automata have just one initial state (whereas non-deterministic ones may have many)
  2. deterministic automata only have transitions reading letters (whereas non-deterministic ones may have transitions reading the empty string)
  3. deterministic automata transitions lead to at most one state (whereas non-deterministic transitions may lead to any number of states)

These differences can be reflected in the Haskell representation of DFAs:1

data DFA q sigma =
  DFA
  q -- initial state
  (q -> sigma :-> q) -- transitions
  (q -> Bool) -- final states

Membership

To determine whether a word is accepted by a DFA, we need to attempt to follow transitions in the DFA according to the letters of the word, starting at the initial state, and see whether we end up in a final state. We thus decompose acceptance into two parts:

  1. following transitions2

    Following transitions through a DFA in accord with an input word just involves taking a transition from a current state just in case the letter we hear next matches the letter on the transition in question.
    
    > followTrans :: DFA q sigma -> q -> [sigma] :-> q
    > followTrans (DFA _ delta _) = follow
    >   where
    >     follow q [] = Just q
    >     follow q (w:ws) =
    >       case delta q w of
    >         Just q' -> follow q' ws
    >         _ -> Nothing>
    
  2. checking whether we end in a final state

    This is easy, because our representation of DFAs contains a function which tells you whether a state is final or not!

    isFinal :: DFA q sigma -> q -> Bool
    isFinal (DFA _ _ final) = final
    

We can put these two parts together in the following way.

A DFA accepts a word just in case you end up in a final state by moving through the machine in accordance with the letters of the word.

> accepts :: DFA q sigma -> [sigma] -> Bool
> accepts m@(DFA start delta final) ws =
>   case followTrans m start ws of
>     Just q -> final q
>     _ -> False

Things are more difficult in case we are dealing with non-deterministic automata. One strategy is to keep track of all states in which we might be at any given point in time. In order to do this, we will use sets (of states) instead of lists.3

We would like to use sets, which are implemented in the Data.Set module.  We import it for use in our present file.  However, we don't want the function names it defines to clutter up our namespace.  Instead, we import them under the name 'S.'  In order to call a function /f/ from the Data.Set module, we must write 'S.f'.

> import qualified Data.Set as S

For the operation of set difference, it is nice to have a special infix operator, '\\'.

> infixl 9 \\
> (\\) = S.difference

Similarly for set union I would like to just write '+++'.

> infixl 9 +++
> (+++) = S.union

An important operation on sets is closing them under some function.  A set /s/ is closed under a function /f/ just in case any time you apply /f/ to some object /a/ in /s/, the result /f a/ is already in /s/.

> closure :: Ord a => (a -> S.Set a) -> S.Set a -> S.Set a
> closure update set = performClosure set set
>   where
>     performClosure visited toDo =
>       if S.null newToDo then visited
>       else performClosure updatedVisited newToDo
>       where
>         newToDo = foldMap update toDo \\ visited
>         updatedVisited = visited +++ newToDo

An NFA is defined as a triple of a set of initial states, a transition function, and a function telling us whether a state is final.

> data NFA q sigma =
>   NFA
>   (S.Set q) -- initial states
>   (q -> Maybe sigma -> S.Set q) -- transitions
>   (q -> Bool) -- final states

To follow transitions according to a word, we step through the machine *through multiple states at the same time*.

> followTrans :: Ord q => NFA q sigma -> S.Set q -> [sigma] -> S.Set q
> followTrans m@(NFA start delta final) = follow
>   where
>     emptyBall = closure (\q -> delta q Nothing)
>     follow qs [] = qs
>     follow qs (w:ws) = if S.null nextStates then S.empty
>                        else follow nextStates ws
>       where
>         nextStates =
>           emptyBall (foldMap (\q -> delta q (Just w)) qs)

A machine accepts a word just in case one of the states you land in by traversing it in accordance with the letters of the word is a final state.

> accepts :: Ord q => NFA q sigma -> [sigma] -> Bool
> accepts m@(NFA start delta final) = not . S.null . S.filter final . followTrans m start

Keeping track of all states like this amounts to virtually constructing a deterministic automaton, and moving through this automaton in the usual way. We can reify this process, by directly constructing a DFA from an NFA. The basic idea is that states of the DFA correspond to sets of states of the NFA.

To obtain a DFA from an NFA, we identify the (single) start state of the DFA with the /set/ of all start states from the NFA, and a state in the DFA is final just in case it contains one final state from the NFA.

> determinize :: Ord q => NFA q sigma -> DFA (S.Set q) sigma
> determinize (NFA start delta final) = DFA start dDelta (not . S.null . S.filter final)
>   where
>     emptyBall = closure (\q -> delta q Nothing)
>     dDelta qs w = let nexts = emptyBall (foldMap (\q -> delta q (Just w)) qs)
>                   in
>                     if S.null nexts then Nothing
>                     else Just nexts

A different way of navigating through a NFA involves pursuing each possible path on its own. It is conceptually clean to separate the specification/construction of the possible paths from their puruit. Accordingly, we define the space of possible paths for a particular word ws given a non-deterministic machine as a list of trees whose nodes are labeled with states, and where the daughters of a given node are labeled with all the possible states which can be reached from the mother by reading the next symbol of the input. The roots of the individual trees in the list are the start states of the machine.

The function _mkTree_ constructs a tree by means of an /update/ function and an initial 'seed' value.  The update function maps a value to a list of values.  The initial seed value is put at the root of the tree.  The daughter trees are constructed by making trees for each element in the list of updated values obtained by applying the update function to the initial value.

> mkTree :: (a -> [a]) -> a -> Tree a
> mkTree update = grow
>   where
>     grow seed =
>       Node seed -- the seed value is put at the root
>            (fmap grow (update seed)) -- and a tree is grown from
>                                      -- each of the updated values

The function _mapTree_ is the tree version of the map function on lists (_fmap_).  It modifies the values stored at each node in the tree in a way determined by the function you give it as an argument.

> mapTree :: (a -> b) -> Tree a -> Tree b
> mapTree modify = modifyTree
>  where
>    modifyTree (Node a ts) =
>      Node (modify a) -- the root is changed
>           (fmap modifyTree ts) -- each of the original daughters are
>                                -- changed in the same way

The trees of the paths resulting from walking through the machine in accord with an input word can be obtained via the function _mkTree_.  The seed values are the possible start states (each path begins at a particular start state), and a value is updated by reading the next letter in the word (and possibly taking some number of empty transitions).  In order to read the correct letter at any particular state, the value associated with a node is a pair of the current state, and the rest of the input word.

> treesOfPaths :: Ord q => NFA q sigma -> [sigma] -> [Tree q]
> treesOfPaths (NFA start delta final) ws =
>   fmap -- for each start state ...
>        (mapTree fst {- 3. get rid of the words associated with the
>                        nodes -}
>          . mkTree upd -- 2. create a tree
>          . (,ws)) -- 1. pair it with the input word
>        (S.elems start)
>  where
>    upd (q,[]) = [] {- if the input has been exhausted, then there is
>                       nothing left to do -}
>    upd (q,(w:ws)) =
>      fmap -- for each state,
>        (,ws) -- pair it with the remaining input
>        (nextStates q w) {- the states which can be reached by
>                            reading w -}
>    emptyBall =
>      closure (\q -> delta q Nothing)
>    nextStates q w =
>      S.elems (emptyBall (delta q (Just w)))

This list of trees represents all possible ways of navigating through the machine by reading the input. We can choose how we want to explore the paths independently of which paths there are! There are many ways to explore paths systematically, two of which are called depth first and breadth first. The depth first strategy can be described in the following way:

explore all the nodes contained a node’s left-sisters before exploring any nodes contained in any of its right-sisters

And the breadth first strategy can be described as:

explore all nodes at one depth before any nodes at a lower depth

A path in a tree extends from the root to a leaf.  Formally however it is just a sequence of node labels.

> type Path a = [a]

A tree traversal enumerates all paths in a tree.

> type Traversal a = Tree a -> [Path a]

A depth first traversal enumerates paths from left to right in the tree.  To implement a depth first traversal, it is helpful to maintain an intermediate data structure, recording the path taken to arrive at the current node.  Of course, when starting out at the root of the tree, the path taken is empty.

> depthFirst :: Traversal a
> depthFirst = dfs []
>   where
>    dfs path (Node a ts) = concatMap (dfs (a:path) ts)

Note that when visiting a node's daughters, the path to the daughters is the same as the path to their mother extended with their mother.


A breadth first traversal enumerates paths in a tree based on their distance from the root.  (If two paths are the same distance from the root, it enumerates them from left to right.)  Like the depth first implementation, each tree is paired with its path from the root.  In contrast to the depth first implementation, we cannot maintain just a single path, because we are enumerating all paths 'at once', and these lie along different paths from the root.

> breadthFirst :: Traversal a
> breadthFirst t = bfs [(t,[])]
>   where
>     bfs [] = []
>     bfs ((Node a ts,path) : tts) = bfs (tts ++ fmap (, (a:path)) ts)

  1. We use (:->) to represent partial functions, which is defined in the following way: type (:->) a b = a -> Maybe b. ↩︎

  2. Given a function f of type a -> Maybe b, we often find ourselves wanting to apply it to a value of type Maybe a. We very often do this by returning Nothing if the value of type Maybe a we want to apply it to is Nothing, and returning Just (f x) if the value of type Maybe a we want to apply it to is Just x. This happens so frequently that there is a very concise notation for it: haskell do x <- m f x This means we could rewrite the definition of followTrans as follows. haskell followTrans (DFA _ delta _) = follow where follow q [] = Just q follow q (w:ws) = do q' <- delta q w follow q' ws ↩︎

  3. Sets do not allow for multiple occurrances of one and the same element, unlike lists. Because of how sets are implemented in Haskell, only sets of orderable elements can be formed (i.e. the type of elements in a set must be an instance of the Ord typeclass). In addition, whereas we can have infinitely long lists, there can be at most 9.223.372.036.854.775.807 elements in a set. As this number is about 263, which is the number of elements in the powerset of a 63 element set, any time we have a machine with more than 63 states we could potentially run up against this bound. For reference, it is not uncommon to have machines with more than a million states in industrial NLP applications. ↩︎