Week 11 [2024-06-11 Tue]

We discussed tell-tale sets, and extensions to MCFGs.

A tell-tale set for Clark’s algorithm must encode enough information about a language in order for the learner to reconstruct it. The relevant information about the language comes in the form of the production rules, which define it. Accordingly, we must encode each rule as a sentence of the language. A rule is built out of terminal and non-terminal symbols, and so we will encode terminals and non-terminals as strings, and combine them.

  • for a terminal or non-terminal \(X\), \(s(X)\) is the shortest (terminal) string derivable from \(X\) in zero or more steps.
    • if two or more strings are equally short, the one earlier in alphabetical order is chosen.

    • if \(X\) is already a terminal, \(s(X) = X\)

    • extend \(s(\cdot)\) over sequences of terminals and non-terminals homomorphically, so that \(s(AbC) = s(A)\cdot s(b)\cdot s( C)\), and \(s(\epsilon) = \epsilon\).

  • for a non-terminal \(X\), \(c(A)\) is the shortest context \((u,v)\) such that \(uXv\) is derivable in zero or more steps from the start symbol.
    • if two or more contexts are equally short, choose the one earlier in alphabetical order

Using these two operations, a rule of the form \(A \rightarrow \alpha\) is encoded as the string \(c(A) \odot s(\alpha)\). (Here, \((u,v) \odot w = uwv\).)

Homework

  1. Compute a tell-tale set for the grammar with the rules below:
    1. \(S \rightarrow aA\)
    2. \(A \rightarrow Sa\)
    3. \(S \rightarrow Bb\)
    4. \(B \rightarrow bS\)
    5. \(S \rightarrow c\)
  2. Run the congruential learner on this set, writing down the grammar learned.