Finite State Tree Transducers

1 Tree Transducers

A Tree transducer implements a function (or relation) from trees to trees. In contrast to strings, which have a left/right symmetry, trees have a very different structure depending on whether one navigates them top-down or bottom-up. We will focus on bottom-up transductions here.¹

Recall that we are using unranked trees; trees where the number of daughters that a node may have is not determined by its label. The reason for this is simply because it accords with linguistic practice; a VP node might have one daughter (in case of an intransitive V) or two daughters (in case it has a specifier), or three daughters (in some older analyses prior to the widespread adoption of a binary branching hypothesis). A simple inductive definition of such objects can be given as follows: a tree consists of a node label followed by a list of trees.

data Tree a = Node a [Tree a] deriving (Show,Eq)

According to this definition, every node has a list of daughter subtrees. A leaf is a Node with an empty list of daughter subtrees.

We begin with the concept of a tree homomorphism. A homomorphism can be thought of as a transducer with just one state. Thus, if a transducer is homomorphic, it cannot use the context of a node in any way to influence its translation. A homomorphism is therefore fully determined by its kernel – a function which specifies how to interpret each node in isolation. Note that a node is interpreted as a way of combining the interpretations of its daughters. So if a subtree is interpreted as a tree, a node must be interpreted as a way of combining a list of trees into another tree.

type Kernel a b = a -> [b] -> Maybe b
type Hom a b = Tree a -> Maybe b
extend :: Kernel a b -> Hom a b
extend ker (Node a ts) = traverse (extend ker) ts >>= ker a
{- traverse :: (a -> Maybe b) -> [a] -> Maybe [b]
if any element of the input list is mapped to @Nothing@, then the output is @Nothing@.
otherwise, behaves as @map@, ignoring the @Just@ wrapped around the outputs:
traverse f [] = Just []
traverse f (b:bs) =
  case f b of
    Nothing -> Nothing
    Just x ->
      case traverse f bs of
        Nothing -> Nothing
        Just xs -> Just (x : xs)

(>>=) :: Maybe a -> (a -> Maybe b) -> Maybe b
If the input is @Nothing@, so is the output.
Otherwise, applies the function inside of the @Just@:
Nothing >>= f = Nothing
Just x >>= f = f x
-}

Note that each node in isolation must be interpreted as an operation that combines the translations of its daughters into some new output. Specifying (and then varying) these operations will give us a wide variety of different tree transducer models.

1.1 Bottom up tree acceptors

We begin however by looking not at transducers, but instead at acceptors! Acceptors must assign a tree to one of some finite number of categories, and on that basis decide to accept or reject the tree. Thus if subtrees are 'translated' into categories (or states), nodes must be functions from lists of categories to categories. We derive this function from a list of 'rules'.

type AccRule a b = ((a,[b]),b)
mkAccKernel :: (Eq a, Eq b) => [AccRule a b] -> Kernel a b
mkAccKernel rs a bs = lookup (a,bs) rs
{- lookup :: a -> [(a,b)] -> Maybe b

The function @lookup@ takes an object and a list of pairs, and
searches for a pair whose first component is the object we were given.
In case one is found, the second component is returned (wrapped in a
@Just@).  If no such pair is present, @Nothing@ is returned.
-}

While we could simply define a kernel directly (instead of via a list of rules), deriving them from rules restricts the kinds of homomorphisms we can define, to just those definable as an automaton. Functions in Haskell can have recursion, can pattern match, can do conditional branching, etc. None of these are part of how we want to define automata to work.

Given a designated subset of 'final' states, a state homomorphism can be used to accept or reject trees.

mkAcceptingHom :: (Eq a, Eq states) => [AccRule a states] -> [states] -> Hom a Bool
mkAcceptingHom rs finals = fmap (`elem` finals) . extend (mkAccKernel rs)
{- The function @fmap@ applies a function
to a @Maybe@ argument:
fmap f Nothing = Nothing
fmap f (Just x) = Just (f x)
-}

It is inconvenient, however, to have two ways of rejecting an input: Nothing means that the input was not successfully parsed, while Just False means that it was parsed, but rejected. We can turn a Hom a Bool into a function delivering a truth value (i.e. unwrapping the Maybe) by using the function or:

acceptOrReject :: Hom a Bool -> Tree a -> Bool
acceptOrReject h = or . h
{- or :: Maybe Bool -> Bool
or (Just b) = b
or Nothing = False
-}

mkAcceptor :: (Eq a, Eq states) => [AccRule a states] -> [states] -> Tree a -> Bool
mkAcceptor rs fins = acceptOrReject $ mkAcceptingHom rs fins

Of course, a (bottom up) tree automata can also be characterized as a machine which moves from leaves to root.

data BUTA states alph =
  BUTA {stateSet :: [states]
       , finals :: [states]
       , alphabet :: [alph]
       , transitions :: alph -> [states] -> Maybe states}

The transition function of such a machine can be extended so that it operates over entire trees. The somewhat too verbose code is intentionally so-written to highlight similarities with later, necessarily verbose code.

deltaBUTA :: (alph -> [states] -> Maybe states) -> Tree alph -> Maybe states
deltaBUTA d (Node a ts) = 
  do
    -- first, run the machine on the daughters
    states <- traverse (deltaBUTA d) ts
    -- then, determine the state of the tree based on the states of its daughters
    s <- d a states
    return s

The behaviour of such a machine (accepting or rejecting an input) depends on whether the state the machine associates with the input is a final state.

acceptBUTA :: Eq states => BUTA states a -> Tree a -> Bool
acceptBUTA bu = any (`elem` finals bu) . deltaBUTA (transitions bu)
  {- 
    @any@ applies a property to each element of a structure, and
    returns @True@ just in case the property was true of some element.
    In our case, we have an object of type @Maybe states@, which may
    or may not contain an object of type @states@.  So @any@ will be
    true just in case the object is of the form @Just s@, and @s@ is
    an element of the final state set.

    @any f@ is equivalent to @or . fmap f@
  -}

We can construct a tree automaton from a set of acceptor rules. When so doing, the transition function is simply the kernel so derived.

mkBUTA :: (Eq states, Eq alph) => [AccRule alph states] -> [states] -> BUTA states alph
mkBUTA rs fins = BUTA allStates fins letters (mkAccKernel rs)
  where
    (allLettersDups,allStatesDups) = unzip $ fmap (\((a,bs),c) -> (a,c:bs)) rs
    letters = L.nub allLettersDups
    allStates = L.nub $ concat (fins : allStatesDups)

1.2 Bottom up tree transducers

Next we look at functions that translate each subtree of a tree as a tree. Then each node of a tree must be a function from (a list of) trees to a tree. However, not just any function will do! The functions we wish to use should not inspect their inputs – this would be smuggling in the ability to be influenced by context! Instead, they should simply arrange their input trees in a uniform way into a larger tree. This amounts to associating with each node a tree with holes at the leaves, and plugging the inputs into the holes. This kind of object, a tree with holes at the leaves, is called a tree context.

data Context a = 
  NodeC a [Context a] | Hole Int 
  deriving (Show,Eq)

A tree context is implemented must as a tree was, except that there is the possibility to be a named Hole; the name is given as a number. Holes can never have any children beneath them, and so they are de facto leaves. The point of a context is to derive an operation over trees from it: put the input trees into the appropriate holes. We define a function, tree_substitution which turns a context into an operation on lists of trees.

tree_substitution :: Context a -> [Tree a] -> Maybe (Tree a)
-- when you have a hole with name i, put the ith element from the list in the hole, if one exists
tree_substitution (Hole i) ts =  get ts i
-- if you have a tree, keep the node label, and substitute the holes in its daughters
tree_substitution (NodeC a cs) ts = 
  do
    dtrs <- flip tree_substitution ts `traverse` cs
    return (Node a dtrs)

It will be convenient to have a way of extracting the contents of a tree context.

getNodes :: Context a -> [a]
getNodes (Hole _) = []
getNodes (NodeC a cs) = a : concat (getNodes <$> cs)

1.2.1 Tree Homomorphisms

A tree homomorphism is a finite association of tree contexts with node/number of daughter pairs. The tree context associated with a node/number of daughters pair should have the same number of holes as that node has daughters. This invariant is not easily enforced at the type level in Haskell (although it would be in Coq or Agda), and so we will not enforce it. We define a rule as a pair of a node label/integer pair and a tree context; it is well-formed iff the number of distinct holes in the tree context is less or equal to the integer.

type HomRule a b = ((a,Int),Context b)
-- intended interpretation of a rule (a,i) -> c is that c is a context
-- whose holes' names are all less than i.

Given a set of such rules, we can create a kernel.

mkHomKernel :: Eq a => [HomRule a b] -> Kernel a (Tree b)
mkHomKernel rules a bs =
  do
    t <- lookup (a,len) rules
    t' <- tree_substitution t bs 
    return t'
  where
    len = length bs

In contrast to the case of accepting homomorphisms, of type Hom a Bool, we can create a homomorphism of type Hom a (Tree b) by extending such a kernel.

mkBUHom :: (Eq a, Eq b) => [HomRule a b] -> Hom a (Tree b)
mkBUHom = extend . mkHomKernel

1.2.2 Tree Transducers

A (bottom-up) transduction can be viewed simply as a homomorphism which can take a limited (i.e. finite) amount of contextual information into account when replacing node labels with contexts. Such contextual information is called a state. We can define bottom up transducers given a set of modified rules, which instead of caring about the number of daughters had by a node, cares rather about the states of the machine after processing each of the daughter, and a list of final states.

type BURule inputNode state outputNode = ((inputNode,[state]),(state,Context outputNode))

Another way of thinking about this is that, if we modify our data structures from trees to pairs of states and trees, then our homomorphisms can implement more kinds of functions over trees.

mkBUKernel :: (Eq i,Eq s) => [BURule i s o] -> Kernel i (s,Tree o)
mkBUKernel rules a bs = 
  do
    (s,c) <- lookup (a,states) rules
    t <- tree_substitution c trees
    return (s,t)
  where
    (states,trees) = unzip bs

Our homomorphisms can be seen as the special case where the type of states is ().² If we have a set fs of final states, we can define a function which implements the transduction from trees to trees of a set of rules.

transduce :: Eq s => [s] -> Hom a (s,b) -> Hom a b
transduce fs h t =
  do
    (s,t') <- h t
    guard (s `elem` fs)
    return t'

{- guard :: Bool -> Maybe ()

@guard@ takes an expression and evaluates its truth.  If it is @True@,
computation proceedes.  If it is @False@, then @Nothing@ is returned.
-}

Given a kernel, we can use transduce to create a transducer. We wrap this up into a single function, which takes a set of rules and a final state, and produces a tree transduction

mkBUTrans :: (Eq a,Eq s) => [BURule a s b] -> [s] -> Hom a (Tree b)
mkBUTrans rs fins = transduce fins $ extend $ mkBUKernel rs

It is however convenient to have a data structure more canonically representing the machine-perspective on transductions.

data BU s i o =
  BU {states :: [s]
     , iAlph :: [i]
     , oAlph :: [o]
     , finalSts :: [s]
     , trans :: i -> [s] -> Maybe (s,Context o)}

We see that we can define a function to obtain a bottom up machine from a BURule.

mkBU :: (Eq i,Eq s,Eq o) => [BURule i s o] -> [s] -> BU s i o
mkBU rs fins = BU sts iAl oAl fins tr
  where
    (stsDups,iAlDups,oAlDups) = unzip3 $ map (\((i,ds),(s,c)) -> (s:ds,i, getNodes c)) rs
    sts = L.nub $ concat $ stsDups
    iAl = L.nub iAlDups
    oAl = L.nub $ concat oAlDups
    tr = curry (flip lookup rs)

Of course, what is of import is whether the bottom-up machine so obtained is related in some way to the homomorphism derived from the list of BURule's. To make sense of that we need to create some notion of what a bottom-up machine does. We can define functions that map bottom-up machines to transductions from trees to trees.

deltaBU :: (i -> [s] -> Maybe (s,Context o)) -> Hom i (s,Tree o)
deltaBU d (Node a ts) =
  do
    (states,trees) <- unzip <$> traverse (deltaBU d) ts
    (s,ctxt) <- d a states
    t <- tree_substitution ctxt trees
    return (s,t)

The claim is that, for any list rs of rules, any tree t, and a list fs of final states, transduce fs $ deltaBU $ d $ mkBU rs fs == transduce fs $ mkBUkernel rs.

2 Multi transducers

As noted previously, the kind of objects that a transducer associates with subtrees can be changed. Here we explore allowing not just trees, but lists of trees to be the output of a transduction.³ (Multi-homomorphisms will not be treated separately, we can obtain them as the special case of a multi-transducer where states are of unit type (and thus record no information).) A multi-rule is much the same as before, except that it outputs a list of contexts. Another 'difference' (really, making explicit what was implicit before) is that the previous assumption that each state corresponds to just one translated tree no longer holds. Thus we need to indicate along with each state the number of translated trees it corresponds to.

type MultRule i s o = ((i,[(s,Int)]),(s,[Context o]))

A multirule will have the form: $f(q_{1}(t_{1,1},\ldots,t_{1,k_{1}}),\ldots,q_{n}(t_{n,1},\ldots,t_{n,k_{n}})) \rightarrow q(c_{1},\ldots,c_{k})$ where f is the node of the tree being processed, the $q_{i}$ are the states of the transducer after processing the $i^{\mbox{th}}$ daughter, the $t_{i,j}$s are the trees associated with the $i^{\mbox{th}}$ state, and the $c_{i}$ are the contexts associated with the output state. Such a rule would be rendered in the following way as our Haskell type MultRule: ((f,[(q1,k1)..(qn,kn)]),(q,[c1..ck]))

Given a list of multi-rules, we can construct a Kernel.

mkMultKernel :: (Eq i,Eq s,Eq o) => [MultRule i s o] -> Kernel i (s,[Tree o])
mkMultKernel rules i bs = 
  do
    (sn,cs) <- lookup (i,ss) rules
    ts <- flip tree_substitution (concat trees) `traverse` cs
    return (sn,ts)
  where
    (states,trees) = unzip bs
    ss = zip states $ fmap length trees

We can again use transduce to allow us to filter out those tree tuples which are associated with the wrong states

mkMultTrans :: (Eq a,Eq s,Eq b) => [MultRule a s b] -> [s] -> Hom a ([Tree b])
mkMultTrans rs fins = transduce fins $ extend $ mkMultKernel rs

A multi-transducer is also much the same as our previous transducer. The difference is of course the type of the transition function, which is of course quite similar to the type of a multi rule.

data MultBU s i o =
  MultBU {statesMult :: [s]
         , iAlphMult :: [i]
         , oAlphMult :: [o]
         , finalStsMult :: [s]
         , transMult :: i -> [(s,Int)] -> Maybe (s,[Context o])}

As expected, we can construct a multi-transducer from a specification of the rules, and the final states.

mkMultBU :: (Eq i,Eq s,Eq o) => [MultRule i s o] -> [s] -> MultBU s i o
mkMultBU rs fins = MultBU sts iAl oAl fins tr
  where
    (iAlDups,stsDups,oAlDups) = unzip3 $ fmap (\((i,ss),(s,ctxs)) -> (i,s:(fmap fst ss),ctxs >>= getNodes)) rs
    sts = L.nub $ concat $ stsDups
    iAl = L.nub iAlDups
    oAl = L.nub $ concat oAlDups
    tr = curry (flip lookup rs)

3 Macro transducers

Using trees (or lists thereof) as the output data structure entails that nodes in the translations of different daughters must be independent (i.e. they must not dominate each other) in the final output structure. This can sometimes be too restrictive. For example, if we are transducing a tree into a string (qua monadic tree), we would like to be able to have nodes (i.e. letters) from one subtree dominate nodes from another one. The obvious solution to this problem is to change the output data structure. We would like a data structure which can be modified not only at the root (like a tree), but also at its frontier. Luckily, we have been using one such: Context. If our trees should be transduced into tree contexts, then each node must be an operation which takes contexts as arguments, and produces contexts; a context context.

data CContext a = 
  NodeCC a [CContext a] | CHole0 Int | CHole1 Int [CContext a]
  deriving (Show,Eq)

If we are going to be manipulating contexts, we will need to replace the holes of our contexts not with trees, but with other contexts. This operation is called context composition. It is, as code, identical to tree_substitution, but for the constructors on the right hand sides; in tree_substitution we return a Node a dtrs whereas here we return a NodeC a dtrs.

ctxt_composition :: Context a -> [Context a] -> Maybe (Context a)
ctxt_composition (Hole i) g = get g i
ctxt_composition (NodeC a cs) g = 
  do
    dtrs <- flip ctxt_composition g `traverse` cs
    return (NodeC a dtrs)

We will also need to perform the analogue of tree_substitution, but where we are substituting contexts for the internal holes of a context context! This operation makes use of context composition, so as to put the contexts which are the daughters of an internal hole inside of the context we are filling the hole in with.

cCtxt_substitution :: CContext a -> [Context a] -> Maybe (Context a)
-- if you are a leaf hole, that stays the same
cCtxt_substitution (CHole0 i) _ =  Just (Hole i)

-- if you are a node, just do the substitution on all daughters
cCtxt_substitution (NodeCC a ccs) cs = 
  do
    dtrs <- flip cCtxt_substitution cs `traverse` ccs
    -- doing the substitution on all daughters
    return (NodeC a dtrs)

-- if you are an internal hole, replace it with the appropriate
-- context (obtained via @get cs i@).  Because the daughters of this
-- guy are still @CContext@s, we need to turn them into @Contexts@ by
-- substituting into the open @CHole1@s the appropriate contexts from
-- @cs@.  Then we can compose the contexts.
cCtxt_substitution (CHole1 i ccs) cs =
  do
    ctx <- get cs i
    -- getting the appropriate context to replace the internal hole
    ctxts <- flip cCtxt_substitution cs `traverse` ccs
    -- turning the daughters of the internal hole into normal contexts
    ctxt_composition ctx ctxts

A Macro Tree Transducer rule is much the same as our previous transducer models (in particular, the bottom-up one); this time however nodes are associated with tree context contexts.

type MacroRule i s o = ((i,[s]),(s,CContext o))

Given a set of macro rules, we can construct a Kernel in the same way we always do.

mkMacroKernel :: (Eq i,Eq s) => [MacroRule i s o] -> Kernel i (s,Context o)
mkMacroKernel rules a bs = 
  do
    (s,cctx) <- lookup (a,states) rules
    ctx <- cCtxt_substitution cctx ctxts
    return (s,ctx)
  where
    (states,ctxts) = unzip bs

This allows us to construct a macro transduction using transduce:

mkMacroTrans :: (Eq i,Eq s) => [MacroRule i s o] -> [s] -> Hom i (Context o)
mkMacroTrans rs fins = transduce fins $ extend $ mkMacroKernel rs

Footnotes:

There is a free ebook 'Tree Automata Techniques and Applications' which covers tree automata (and transducers) in wonderful detail.

Note that the integer component of the type HomRule is obtainable as the length of the list of units in the type BURule.

The resulting tree transducer model is called multi-bottom-up tree transducers.