A DFST Module

Module Exports and Imports

We announce to Haskell that we would like to bundle together the code in this file, and make it available to be used by other programs under the name Dfst. Furthermore, we make explicit which of the functions defined in this file the module should make available to other programs.

module Dfst
  ( SeqTrans
  , ForwardTrans
  , BackwardTrans
  , states
  , inAlph
  , outAlph
  , start
  , initial
  , delta
  , final
  , mkForwardTransducer
  , mkBackwardTransducer
  , transduce
  , compose
  ) where

We will use functions from other pre-defined modules. The modules we import (as well as the functions we wish to use from them) are given below.

import Dfsa as A (DFSA(A),recognizes)
-- yes, our very own module
import Data.Foldable as F (foldlM)
-- http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Foldable.html#v:foldlM
import Data.Maybe as M (isJust)
-- http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Maybe.html#v:mapMaybe
import Data.List as L (intersect,nub)
-- http://hackage.haskell.org/package/base-4.12.0.0/docs/Data-List.html
--  import Test.QuickCheck as Q
-- the module Test.QuickCheck provides us with tools for testing
-- whether a function we have designed is doing what it should.

Transducers

A machine implementing a function (or relation) is called a transducer. A transducer can be thought of as translating an input sentence into an output sentence. Perhaps the simplest model of translation is one where each word in the sentence is translated independently of any others. On this model, we might translate an English sentence like I withdrew money from the bank to the 'German' Ich zog zurück Geld von der Ufer. This is, however, a terrible attempt at a translation. The universally acknowledged problem with word-for-word translation schemes is that they don't take into account the context of a word. For example, the word the can be translated in six ways, depending on the gender, number, and case of its governing noun; our choice here of der was the wrong one (the most frequent form was chosen). A perennial problem of linguistics is to determine just what kind of context is necessary for an account of the use we make of language. In the present setting, we allow ourselves to remember a finite amount of information, which we represent as states. The translation of a word w will depend on the state we are currently in, and, similarly, the state we are currently in was determined by the state we were in before and the last word we saw. There is an initial state, and we allow that we begin a translation without having heard anything yet. Similarly, when the end of a sentence is reached, we may output a few final words depending on the state we find ourselves in.

We define sequential transducers as a new type of object, using the keyword data. Objects of this type can only be created using the S constructor, which takes five arguments in the following order:

the state set
the input alphabet
the output alphabet
the initial state
the initial output
the transition function
the final output function

data SeqTrans q b m = S { states :: [q]
                        , inAlph :: [b]
                        , outAlph :: m
                        , start :: q
                        , initial :: m
                        , delta :: q -> b -> Maybe (q,m)
                        , final :: q -> Maybe m
                        }

In contrast to the input alphabet, which we require to be symbols, and objects made up of those symbols to be lists, we place no such restrictions on the output. This is because, in contrast to the input, which we need to look at one symbol at a time, all that we need to do with the output is put the pieces of output together! Thus we require a way of assembling the pieces of whatever the output might be, and also a notion of an empty output. This is the concept of a monoid, which is a pre-defined typeclass in Haskell, which is defined as per the below. Note that lists (in particular strings of characters) are an instance of a monoid, where the empty element is the empty list, and putting lists together is done via list append.¹ ^,²

class Monoid m where
  mempty :: m
  mappend :: m -> m -> m
  mconcat :: [m] -> m

instance Monoid [a] where
  mempty = []
  mappend s t = s ++ t
  mconcat = concat

The final states are those over which the final output function is defined.

finals :: SeqTrans q b m -> [q]
finals s = filter (M.isJust . final s) $ states s

Sequential transducers are often presented so that the function delta is broken up into two independent functions, one which deals with next states, and one which deals with outputs. These can be defined in terms of delta.³

nextState :: SeqTrans q b m -> q -> b -> Maybe q
nextState s q b =
  do
    (q',_) <- delta s q b
    return q'

nextOutput :: SeqTrans q b m -> q -> b -> Maybe m
nextOutput s q b =
  do
    (_,m) <- delta s q b
    return m

Transiting

We define a function, deltaStar, which takes multiple steps through the transducer at once.

deltaStar :: Monoid m => (q -> b -> Maybe (q,m)) -> (q,m) -> [b] -> Maybe (q,m)
deltaStar _ qm [] = return qm                 -- if there are no more steps
                                              -- to take, return the
                                              -- current state/output pair
deltaStar d (q,m) (b:bs) =                    -- otherwise,
  do
    (q',m') <- d q b                          -- follow the next edge to a new state,
    deltaStar d (q',m <> m') bs               -- update the output, and continue

Transduction is achieved by outputting the initial output, appending it to the result of walking through the machine according to the input, and postpending to it the final output.

transduce :: Monoid m => SeqTrans q b m -> [b] -> Maybe m
transduce s bs =                                         -- to transduce bs in machine s,
  do
    (q,m) <- deltaStar (delta s) (start s,initial s) bs  -- first transit through the machine
                                                         -- starting from the initial state,
                                                         -- with the initial output
    m' <- final s q                                      -- then determine the final output from
                                                         -- the state where you end up
    return (m <> m')                                     -- and concatenate this final output
                                                         -- with the output from transiting
                                                         -- through the machine

Underlying Acceptors

A transducer defines a partial function from inputs to outputs. They are partial in the sense that not every possible input word must be mapped to an output. As they are partial, we can ask about the set of inputs on which they return an output, and about the set of outputs that the machine could return. As it turns out, both of these sets are regular - we can define them in terms of a DFSA. The set of inputs on which an output is returned is obtained simply by erasing the output portion of the transitions.⁴

domain :: SeqTrans q b m -> DFSA q b
domain s = A.A (states s) (inAlph s) (start s) (finals s) (nextState s)

This fact provides us with the means to obtain a simple demonstration that transducers are not closed under intersection. Consider the function mapping a string consisting of a sequence of the letter a followed by a sequence of the letter b to a sequence of the letter a, obtained simply by erasing all instances of the letter b. This is easily seen to be a sequential transduction:

data AsAndBs = As | Bs
eraseFinalB :: SeqTrans AsAndBs Char [Char] 
eraseFinalB = S [As,Bs] ['a','b'] ['a','b'] Bs "" d f
  where
    f _ = Just ""
    d As 'a' = Just (As,['a'])
    d Bs 'a' = Nothing
    d _ 'b' = Just (Bs,"")

We can describe the behaviour of this function (roughly) by writing that it maps inputs of the form \(a^{n}b^{\ast}\) to an output of the form \(a^{n}\); or, abbreviating, \(a^{n}b^{\ast} \mapsto a^{n}\). Here, \(a^{n}\) represents a sequence of length n, each element of which is the letter a. And \(b^{\ast}\) represents a sequence of any length, each element of which is the letter b.

Similarly, the function mapping the same inputs to strings of a, obtained by erasing the a in the input, and changing each input b to an output a, in symbols \(a^{\ast}b^{n} \mapsto a^{n}\), is also a sequential transduction:

changeBtoA :: SeqTrans AsAndBs Char [Char] 
changeBtoA = S [As,Bs] ['a','b'] ['a','b'] Bs "" d f
  where
    f _ = Just ""
    d As 'a' = Just (As,"")
    d Bs 'a' = Nothing
    d _ 'b' = Just (Bs,['a'])

Observe that if we intersect these two functions, in the sense that we obtain a new function mapping only those inputs to those outputs that both functions agree upon, this new function has the following behaviour: \(a^{n}b^{n} \mapsto a^{n}\). In other words, it is defined only on inputs where the number of a's in the beginning is equal to the number of b's in the end. This new function however is not a sequential transduction, as its domain, \(a^{n}b^{n}\), is not a regular set.

Combining Transducers

We would like to combine two transducers in such a way as to generate the composition of their respective functional relations. We will use a semi-colon to indicate this manner of combining transducers, and we would like to ensure that the language of the machine \(S;T\) is the composition of the languages of \(S\) and of \(T\): \(L(S;T) = L(T) \circ L(S)\). \(S;T\) means intuitively that on an input w, first we run \(S\) to obtain an output u, and then on that output u we run \(T\) to obtain the output v. We encounter many situations like this, in linguistics, in science, and in our daily life. The assumption that one must complete a first process before beginning a second is called by Ed Stabler the pedestrian's assumption, and it is, as we can see with a little bit of reflection, false.⁵ The interesting question for us, here, is how to interleave the processing of the input by both machines \(S\) and \(T\). One invariant of the transduction process is that output is performed at each input letter, and that later outputs do not change previous outputs. This suggests the following strategy: at each step we process the input character with the first transducer, obtain an intermediate output, and then, before moving on to the next input character, fully process this intermediate output with the second transducer. In order to do this, we need to record where we are in each machine at any given time. This recalls the product construction we used in combining automata, where states are pairs of states from the two underlying machines.⁶

compose :: Monoid m => SeqTrans q b [c]  -> SeqTrans r c m -> Maybe (SeqTrans (q,r) b m)
compose s t =
  case transduce t (initial s) of
    Nothing -> Nothing
    Just initialOutput -> Just
    (S newStates
      (inAlph s)
      (outAlph t)
      (start s, start t)
      initialOutput
      newDelta
      newFinal)
  where
    newStates = [(q,r) | q <- states s, r <- states t]
    deltaS = delta s
    deltaT = deltaStar (delta t)
    newDelta (q,r) b = do                 -- in state (q,r), reading b,
      (q',cs) <- deltaS q b               -- process b in machine s,
                                          -- obtaining cs
      (r',m) <- deltaT (r,mempty) cs      -- then process cs in machine
                                          -- t, obtaining m
      return ((q',r'),m)                  -- return where we are in each
                                          -- machine
    newFinal (q,r) = do                   -- (q,r) is a final state if
      cs <- final s q                     -- q is, outputting cs
      (r',m) <- deltaT (r,mempty) cs      -- which causes t to move
                                          -- from r to r', producing m
      m' <- final t r'                    -- and r' is, outputting m'
      return (m <> m')                    -- return m followed by m'

Directions

There are two ways to use a string as input:

reading from left to right (as in English)
reading from right to left (as in Hebrew)

As we have defined our operations, we are always reading from left to right. However, there is no reason to privilege operating on strings in this direction. Indeed, there are functions on strings that can be defined as a sequential transducer operating from right to left on its input that cannot be defined in terms of a sequential transducer operating from left to right on its input (and vice versa). A particularly simple example is given by the function that changes the first letter of a string to be the same as its last letter. Thus, the string abb is mapped to bbb, the string baab to itself, and the string ba to aa. That this cannot be implemented as a left-to-right transducer is due to the fact that we must delay the output of the first letter until we know the identity of the last letter, which means that nothing can be written until we have seen everything. This would require the transducer to remember everything it has seen, but it cannot, since it has only a finite memory. A sequential transducer operating in the reverse direction can implement this function rather straightforwardly; it simply remembers what the first letter it sees is (this is the last symbol of the input), and delays the output of each successive symbol by one time step, replacing the last symbol read (i.e. the first symbol of the input) with this. We can simulate walking through a string from right to left, while writing our output from right to left, by first reversing the string, writing the input (which is now backwards), and then reversing the output again. This restricts our choice of output monoid to one which supports the reverse operation, namely lists, but allows us to represent a reverse transducer by a normal one.

data StatesLToF = SeenNothing | FirstAPrevA | FirstAPrevB  | FirstBPrevA | FirstBPrevB deriving Enum
changeLastToFirst :: SeqTrans StatesLToF Char [Char]
changeLastToFirst = S [SeenNothing..FirstBPrevB] ['a','b'] ['a','b'] Nothing "" d f
  where
    d SeenNothing 'a' = Just FirstAPrevA
    d SeenNothing 'b' = Just FirstBPrevB
    d FirstAPrevA 'a' = Just (FirstAPrevA,"a")
    d FirstAPrevB 'a' = Just (FirstAPrevA,"b")
    d FirstAPrevA 'b' = Just (FirstAPrevB,"a")
    d FirstAPrevB 'b' = Just (FirstAPrevB,"b")
    d FirstBPrevA 'a' = Just (FirstBPrevA,"a")
    d FirstBPrevB 'a' = Just (FirstBPrevA,"b")
    d FirstBPrevA 'b' = Just (FirstBPrevB,"a")
    d FirstBPrevB 'b' = Just (FirstBPrevB,"b")
    f SeenNothing = Just ""
    f FirstAPrevA = Just "a"
    f FirstAPrevB = Just "a"
    f FirstBPrevA = Just "b"
    f FirstBPrevB = Just "b"

The machine changeLastToFirst runs from left to right, changing the last input symbol to the first one. If we reverse the input, and then reverse the output, we can simulate going from right to left.

revTrans :: SeqTrans q b [c] -> [b] -> [c]
revTrans t i = reverse $ transduce t $ reverse i

This allows us to write one machine and use it alternatingly as a left-to-right machine, or as a right-to-left machine. As long as we are very disciplined, and remember in each case which direction we intended to use a given machine, this is not a problem. However, there is no formal difference (yet) between left-to-right and right-to-left machines. In this section, we revisit our previous definitions, and reimplement them so that left-to-right and right-to-left machines have different types. We would like, however, to hide the details of this from the end user (in this case, our future selves), which means that we would like to have single operations for transduction and composition, that do different things depending on the kinds of machines they are given.

We begin by defining two types, for directions.

data Fwd
data Bwd

Both of these types are empty; they have no inhabitants. Their use will appear shortly. We also define another type for directions, which can be indexed by another type.

data Direction d where
  Forward :: Direction Fwd
  Backward :: Direction Bwd

This uses generalized algebraic data type (GADT) syntax to define singleton types, types with only one inhabitant. The types Direction Fwd and Direction Bwd are different types, and each of which is the type of just one object (Forward and Backward respectively). There is a type Direction a for each type a, but, with the exception of the above two, all are empty.

We will redefine transducers so that they have a Direction component.

data SeqTrans q b m d where
  S :: Direction d -> [q] -> [b] -> m -> q -> m -> (q -> b -> Maybe (q,m)) -> (q -> Maybe m) -> SeqTrans q b m d

We will want to hide this data type from end users, so that a user of this module may not interact with it. Accordingly, we define functions which construct transducers with both settings of this parameter. We also define convenient type abbreviations.

type ForwardTrans q b m = SeqTrans q b m Fwd
mkForwardTransducer ::  [q] -> [b] -> m -> q -> m -> (q -> b -> Maybe (q,m)) -> (q -> Maybe m) -> ForwardTrans q b m
mkForwardTransducer = S Forward
type BackwardTrans q b m = SeqTrans q b m Bwd
mkBackwardTransducer ::  [q] -> [b] -> m -> q -> m -> (q -> b -> Maybe (q,m)) -> (q -> Maybe m) -> BackwardTrans q b m
mkBackwardTransducer = S Backward

We now by and large replicate the development of transducers from above, with the primary difference being the direction type parameter, and the Direction parameter of the constructor. We define by pattern matching the selector functions on sequential transducers. These ignore (i.e. are polymorphic in) the direction parameter.

direction :: SeqTrans q b m d -> Direction d
direction (S dir _ _ _ _ _ _ _) = dir
states :: SeqTrans q b m d -> [q]
states (S _ qs _ _ _ _ _ _) = qs
inAlph :: SeqTrans q b m d -> [b]
inAlph (S _ _ iA _ _ _ _ _) = iA
outAlph :: SeqTrans q b m d -> m
outAlph (S _ _ _ oA _ _ _ _) = oA
start :: SeqTrans q b m d -> q
start (S _ _ _ _ st _ _ _) = st
initial :: SeqTrans q b m d -> m
initial (S _ _ _ _ _ i _ _) = i
delta :: SeqTrans q b m d -> (q -> b -> Maybe (q,m))
delta (S _ _ _ _ _ _ d _) = d
final :: SeqTrans q b m d -> (q -> Maybe m)
final (S _ _ _ _ _ _ _ f) = f

We would like to define a general transduction function, which decides what exactly to do based on the directionality of the machine. As we will be using the 'trick' of simulating a right-to-left transduction by reversing the input, transducing in the left-to-right direction, and then reversing the output again, we need to have a way of changing the direction of a machine, at least temporarily.

reverseDirection :: BackwardTrans q b m -> ForwardTrans q b m
reverseDirection (S Backward s iA oA st i d f) = mkForwardTransducer s iA oA st i d f

Exactly what the function transduce does with its argument depends on the value of the Direction parameter of the machine. If it is Forward, transduce looks just the same as before.

transduce :: Monoid m => SeqTrans q b m d -> [b] -> Maybe m
transduce t@(S Forward _ _ _ _ _ _ _) bs =
  do
    (q,m) <- deltaStar (delta t) (start t,initial t) bs 
    m' <- final t q                                      
    return (m <> m')

If it is backward, we simulate the right-to-left transduction with a left-to-right one.

transduce t@(S Backward _ _ _ _ _ _ _) bs = reverse <$> transduce (reverseDirection t) $ reverse bs

Another main reason for introducing directional types in our sequential transducers is found in the composition function: here, both composands must have the same directionality, and the resulting machine must also be used in that same direction. Composition of machines running in different directions (using the construction above) does not make intuitive sense. Now it does not make formal (i.e. type theoretic) sense either. Note that virtually the same code as before is used.

compose :: Monoid m => SeqTrans q b [c] d  -> SeqTrans r c m d -> Maybe (SeqTrans (q,r) b m d)
compose s t =   case transduce t (initial s) of
    Nothing -> Nothing
    Just initialOutput -> Just
      (S
       (direction s) 
       newStates
       (inAlph s)
       (outAlph t)
       (start s, start t)
       initialOutput
       newDelta
       newFinal)
  where
    newStates = [(q,r) | q <- states s, r <- states t]
    deltaS = delta s
    deltaT = deltaStar (delta t)
    newDelta (q,r) b = do
      (q',cs) <- deltaS q b
      (r',m) <- deltaT (r,mempty) cs
      return ((q',r'),m)
    newFinal (q,r) = do
      cs <- final s q  
      (r',m) <- deltaT (r,mempty) cs
      m' <- final t r'
      return (m <> m')

Footnotes:

The particular interest of formulating outputs in terms of monoids lies in the possibility of using other, less obvious types of outputs, such as integers (for counting), real numbers (for probabilities), or finite sets (for non-determinism).

newtype Sum a = Sum a
instance Num a => Monoid (Sum a) where
  mempty = Sum $ fromInteger 0
  Sum i `mappend` Sum j = Sum $ i + j

newtype Prod a = Prod a
instance Num a => Monoid (Prod a) where
  mempty = Prod $ fromInteger 1
  Prod i `mappend` Proj j = Prod $ i * j

newtype FinSet a = FinSet [a]
instance Monoid m => Monoid (FinSet m) where
  mempty = FinSet []
  FinSet sa `mappend` FinSet sb =
    FinSet [a `mappend` b | a <- sa, b <- sb]

The monoidal multiplication (mappend), can be written as <>. This is because every Monoid is a Semigroup, and the single operation of a Semigroup is written such.

class Semigroup a where
  (<>) :: a -> a -> a
class Semigroup a => Monoid a where
  mempty :: a
  mappend :: a -> a -> a
  mappend = (<>)

Recall that, if we have a value of type Maybe a, we can use a do-block to extract the value of type a that it (might) contain:

-- mi is of type 'Maybe Int'
do
  i <- mi
  return (i + 3)

In this code block, mi is of type Maybe Int. That is, it may or may not be an Int (it is either Nothing, or Just i for some integer i). Inside the do-block, we write i <- mi to extract the Int that may be lurking in the mi. If there is none, the entire do-block simply gives the value Nothing. If there is one, however, this is assigned to the variable i, and the computation continues. The statement in the last line of the do-block must be of type Maybe Int. In this case, the function return wraps up an integer inside of a Just.

⁴

This can be heuristically verified using the Test.QuickCheck module. We would like to verify that the set of strings accepted by the domain automaton of a transducer is identical to the set of strings the transducer successfully transduces; in symbols: \(\{bs | \texttt{recognizes}\ (\texttt{domain}\ s)\ bs\} = \{bs | \texttt{isJust}\ (\texttt{transduce}\ s\ bs)\}\). Such an equality between sets reduces to two inclusions, the left-to-right inclusion of the left-hand set in the right-hand set, and the right-to-left inclusion of the right-hand set in the left-hand set. Accordingly, we define a property (prop_domainTransducer) which arbitrarily chooses one of two properties to test, on any given input. The first, domainTransLR, takes a transducer and a string as arguments, such that the acceptor obtained from the transducer recognizes the string, and returns True just in case the transducer successfully transduces the string. The second, domainTransRL, takes a transducer and a string as arguments, such that the transducer successfully transduces the string, and returns True just in case the acceptor obtained from the transducer recognizes the string.

prop_domainTransducer :: (Monoid m, Eq q, Eq b) => SeqTrans q b m -> [b] -> Property
prop_domainTransducer s bs =
  domainTransLR bs .&. domainTransRL bs
  where
    domainTransLR bs =
      recognizes (domain s) bs ==> isJust (transduce s bs)
    domainTransRL bs =
      isJust (transduce s bs) ==> recognizes (domain s) bs

However, when we attempt to test whether a transducer satisfies the desired property on 100 random inputs, we get a warning message:

*Main> quickCheck (prop_domainTransducer eraseFinalB)
*** Gave up! Passed only 92 tests.

The reason for this is that most of the random inputs did not satisfy the preconditions of our property (i.e. most random inputs are not strings accepted by the automata, or strings successfully transduced by the transducer). This is partly because our machine only expects strings consisting of letters 'a' and 'b', whereas arbitrary character strings contain many more letters. We can explicitly take control of the random input generation process, directing Haskell to generate those inputs which are more reasonable. We do this by defining a newtype wrapper for our type [Char] (aka String), and defining our own way of creating an arbitrary element thereof. This is done by declaring our new type an instance of the Arbitrary typeclass (to which the quickCheck function refers). Here, we have decided to consider only strings of the following form: \(tuvw\), where each of t, u, v, and w is a string of at most length 100 consisting entirely of either the letter 'a' or the letter 'b'.

newtype Alph = Alph {unAlph :: String} deriving Show
instance Arbitrary Alph where
  arbitrary = do                               -- to make an arbitrary 'Alph'
    i <- choose (0 :: Int,4)                   -- first choose an 'Int' between 0 and 4
    makeAlph i                                 -- then 
      where
        makeAlph 0 = return $ Alph ""          -- if you chose 0, return the empty string
        makeAlph i = do                        -- otherwise
          len <- choose (0,100)                -- choose a length between 1 and 100
          c <- elements ['a','b']              -- and a letter, then
          Alph w <- makeAlph (i - 1)           -- make the rest of the string
          return (Alph (replicate len c ++ w)) -- and copy the letter
                                               -- len times in front
                                               -- of the string

We can now test that our simple transducer satisfies these properties on 100 random inputs, of the above shape.

*Main> quickCheck (prop_domainTransducer eraseFinalB . unAlph)
+++ OK, passed 100 tests.

⁵

Stabler gives an example of a chef preparing a dinner (first task) and then serving it (second task). He says:

only the most inexperienced chef would finish the preparation before beginning the serving

⁶

We would like to verify that composition as we have implemented it correctly produces the result of running the second machine on the output of the first. We define a property which checks whether the output of the composed transducer on a string is the same as that of the two transducers applying sequentially.

prop_composeCorrect :: (Monoid m,Eq m)
                    => SeqTrans q b [c] -> SeqTrans q' c m -> [c] -> Property
prop_composeCorrect s t bs =
  ((s `compose` t) `transduce` bs) == ((s `transduce` bs) >>= (t `transduce`))