Word based analyses

The previous posts have introduced a formalism in which we may write minimalist analyses. The formalism actually provides a system for constructing links between lexical items, which while initially unfamiliar, are actually equivalent (intertranslatable) with the structures involving movement that we normally use. While links between lexical items could in principle be drawn in many ways (e.g. ‘if your graph has a prime number of nodes, draw a link between two at random’), the system we have been exploring lexicalizes the link drawing procedure, by putting features on lexical items. This is a useful starting point, although we can imagine ways of relaxing this.1

We have implemented a number of extensions to this basic system, such as directionality of selection and movement and covert movement. We have seen (or I have at least claimed) that these are best thought of not as modifications to the underlying syntactic system, but as modifications to how syntactic structures are interpreted at interfaces. I have alluded to in a post that we will be conceptualizing agreement not as a seperate syntactic operation (on a par with merge / move or rather ‘connect lexical items’), but rather also more of an interface operation. Altogether, this covers pretty much the gamut of ‘what minimalist syntacticians want to have in their syntactic toolbox’, except for one glaring omission: we have not spoken at all of head movement how morphology and syntax should interact. This is rather one of the topics of this course.

Grammatical Sins

Our analyses (in homeworks, and examples) have all involved silent lexical items. This is no wonder, given that the syntactic tradition we have learned makes prolific use of the same. The appeal to silent lexical items is, however, a point of contention in many other traditions. Let me briefly try to argue why using silent lexical items is bad, all else being equal. Our grammars work by assigning properties to words, which, in conjunction with the operations of the grammar, determine the predicted distribution of that word. Each word occurs just where it occurs and nowhere else; in other words, each word has exactly one actual distribution - its own. In the ideal case, a word’s actual distribution would be exactly captured by the properties the grammar assigns to it - its predicted distribution should line up with its actual one. Postulating syntactic ambiguity is an admission that our grammar is incapable of describing the actual distribution of that word in the language - in order to describe its actual distribution, we are forced to divide it up into multiple predicted distributions.2 Using silent lexical items is similarly an admission that our grammar is incapable of describing the actual distribution of words in a language.

There are thus two kinds of ‘sins’ that a grammar formalism can be guilty of:3

  • needing to use multiple predicted distributions in order to capture the actual distribution of a word
  • resorting to silent lexical items

These sins can of course be combined - we could have multiple silent lexical items!

Homophony

The two sins above also differ in another way; there is a very simple way to enrich our grammar formalism so as to be able to combine multiple predicted distributions into one. We can add a feature-bundle combining operator, written \(\oplus\), and for every word w with multiple lexical entries \(\textsf{w}:\alpha,
\textsf{w}:\beta,\ \textsf{w}:\gamma,\ \ldots\) we can combine them into one monolithic one: \(\textsf{w}:\alpha \oplus \beta \oplus \gamma \oplus \dots\). To work with this new kind of feature bundle, we can add an operation, Select, which takes a lexical item of that form and chooses one of the sub-feature bundles. This is clearly cheating, but it is not so easy to draw the line; is combining parts of feature bundles in this way also cheating? Consider the following hypothetical lexical entry for eat: \[\textsf{eat}:(\bullet d \oplus (\bullet d.d\bullet)).v\] This lexical entry expresses that eat can either be intransitive (by selecting just one d argument) or transitive (by selecting two). This doesn’t seem as egregious a sin as simply combining two independent feature bundles together, but it is not immediately obvious how to make a principled distinction between these two strategies.4, 5 The higher level perspective is that our grammars (operations together with feature bundle notations) allow us to refactor analyses at the level of the lexicon. In this particular case, having a ‘choice’ operator \(\oplus\) on hand allows us to change our grammatical analysis by combining multiple lexical entries into one, and we can perform algebraic manipulations on feature bundles to further simplify our grammatical descriptions. Note that the way we can refactor our analyses by manipulating lexical items crucially depends on the operations and representations we have available to us - adding the \(\oplus\) operator allowed us to do things at the lexical level we could not do before.

Silent sins

In practice, silent lexical items are often in fact unnecessary (in other grammatical traditions).6 Extending the reasoning above a bit, we should disprefer grammatical frameworks which push us to postulate them (i.e. minimalist/transformational ones). However, intuitively, silent lexical items are postulated for a reason: they allow us to factor out regularities across lexical items, which might allow us to eliminate homophony! Taking as our example of the flexible transitivity of eat, instead of adding a choice operator to our feature bundle representation scheme, we can postulate a single, uniform lexical entry for eat: \(\textsf{eat}:V\), together with three silent lexical items \(\epsilon: \bullet AgrO.d\bullet.v\), \(\epsilon: \bullet V.d\bullet.AgrO\), and \(\epsilon:\bullet V.AgrO\). This would be expressed in words as:

  • there is a single lexical head, eat
  • two functional heads, both AgrO, exist, one which adds an internal argument, and one which does not
  • a functional head, v, exists, which adds an external argument

The intuition here is then that silent lexical items allow us to ‘factor apart’ our lexical items; this gives us in essense the reverse behaviour that we saw with the choice operator (which allowed us to combine them). This intuition has some formal justification; although it is always possible to refactor context-free grammars so as to avoid silent lexical items (usually presented as rules with empty right hand sides) altogether, having silent lexical items greatly reduces the size of the lexicon (Gruska, 1975).7

Homework

In preparation for next time, analyze the following sentences without silent lexical items. Please take a whole-word perspective, so that the lexical items are actual words. Note that you will, without the homophony reducing power of silent heads, almost certainly have an analysis with many copies of ‘the same word.’ This is ok.

  1. John jumped.
  2. John didn’t jump.
  3. John jumps.
  4. John doesn’t jump.
  5. John will jump.
  6. John won’t jump.
  7. John had jumped.
  8. John hadn’t jumped.
  9. John has jumpd.
  10. John hasn’t jumped.
  11. John will have jumped.
  12. John won’t have jumped.
  13. John was jumping.
  14. John wasn’t jumping.
  15. John is jumping.
  16. John isn’t jumping.
  17. John will be jumping.
  18. John won’t be jumping.
  19. John had been jumping.
  20. John hadn’t been jumping.
  21. John has been jumping.
  22. John hasn’t been jumping.
  23. John will have been jumping.
  24. John won’t have been jumping.

  1. We might allow links to be drawn, without any syntactic features involved at all, between two items which are in a semantic function-argument relation (for example, a verb and its argument). We would have to keep track of the semantic type of expressions, and whether a particular semantic argument position has already been saturated, but this is certainly concievable. Another option might be to allow certain links to be drawn where only one of the involved lexical items has any features. For example, a lexical item might have an EPP feature, and so we might draw a link between it and (perhaps even the closest) something else. One can imagine many more possibilities. These two have been discussed in the literature. The first under the rubric of reducing c-selection to s-selection, and the second is I think a fairly common way of understanding agreement (where a probe has unvalued features, and is searching for a goal where it can get values from). ↩︎

  2. This argument is agnostic about the proper treatment of ‘real’ polysemy, as in the English bank (or perhaps that). If you want to allow multiple lexical entries for real polysemy, simply understand the word ‘word’ in the argument in the appropriate way. ↩︎

  3. I am using the word sin in the sense of ‘falling short of an ideal.’ ↩︎

  4. I don’t think that there is a qualitative distinction, but there is clearly a quantitative one: the size of the feature bundle in the second case (measured in terms of number of features) is smaller (4 vs 5) than it would be had we just combined two independent feature bundles (\(\bullet d.v\) and \(\bullet d.d\bullet.v\)) together. Of course, if we add a new symbol, 1, and interpret it as ‘nothing’ (the unit for \(\oplus\)), we could write the desired feature bundle as \((\bullet d \oplus 1).d\bullet.v\), which uses three features, and expresses directly that the complement of eat is optional. ↩︎

  5. I am not doing full justice to this idea by presenting it in terms of feature bundles. Feature bundles are inherently first order; they are structured as lists of features, and each feature is the positive or negative version of an atomic feature type. So all we are doing is writing grammars to generate (usually finite) sets of feature bundles. Grammar formalisms with a less impoverished notion of a feature bundle, like HPSG, or categorial grammar, have not only richer ways of combining features, but features themselves can have a recursive structure. So (abusing notation) in categorial grammar you could have a single feature of the form \(\bullet (\bullet d\oplus 1)\) which would match against an entire sequence of features (it would be a feature that selects for an optional selection feature). In this case, what amounts for us to a way of abbreviating a set of feature bundles, would interact in rich ways with the grammatical operations in other traditions. Note that our use of dependency structures comes from the categorial grammar tradition (which it inherits from linear logic), where they are called proof nets. ↩︎

  6. Michael Moortgat once lamented that he could get rid of everything but a silent complementizer: I believe that Mary laughed vs I believe ∅ Mary laughed ↩︎

  7. Silent lexical items are formally eliminable in many different grammar formalisms, from that of finite state machines (where they take the form of silent transitions from one state to another), to, well, silent lexical items in categorial grammar, tree adjoining grammar, and minimalist grammar. As far as I know, there are no formal results on the succinctness of these grammars with and without silent lexical items. ↩︎