Formal foundations of minimalist syntax

2020-04-24

Chomsky has said repeatedly that minimalism is a program, and not a theory. The intent is to emphasize that there is a coherent research program, which is independent of any particular instantiation of it. Thus it makes perfect sense to have Cinque-style Cartography coexist alongside the more spare Chomskyian V-v-T-C clausal backbone; both can be different (perhaps incompatible) instantiations of the same general research program. Indeed, there are innumerable (sometimes slight) variations of theories in the minimalist vein.

Similarly, a formalisation of minimalism will of necessity cleave to a particular instance of the program. Yet a good formalisation will also show how to modify it so as to formalise different instances within the program. Issues that seem important will often be revealed to be minor, and unimportant, design decisions (whereas sometimes apparently slight variations in theories can reveal themselves to be substantive). Ed Stabler formalized an early version of minimalism in the late nineties. From the standpoint of the present, it looks somewhat baroque: there is no discussion of agreement, no copies, no set-theoretic structures, no intepretable/uninterpretable distinction, no indices on traces, no sidewards movement, no counter-cyclic operations, no structure removal, etc. Yet we will see in the course that these are idiosyncracies of the instance of the program he formalized, not limitations on the (platonic form of the) formalism itself.

The basic idea is that features of lexical items drive the syntactic computation.¹ These features are reified into what we will continue to call features, and arranged into feature bundles.² We will assume there to be only a finite number of syntactically relevant properties of lexical items. Let these be given in a set \(At\) of atomic features. A lexical item might have multiple features, these will be structured as a list of features, and will be called a feature bundle.³ Borrowing terminology from the dependency literature, in a complete sentence, a lexical item will govern certain others, and will be a dependent of (or be governed by) still others. In constituency terminology, we might rephrase this as saying that a lexical item will contain other (maximal projections of) lexical items in its maximal projection, and will itself be contained in the maximal projections of other lexical items. These are two very different things, and accordingly a lexical item will have two kinds of features: one kind (the positive features) which are relevant for determining which lexical items it will govern, and another kind (the negative features) which are relevant for determining which lexical items may in turn govern it. A positive feature will be written as \(\bullet x\), and a negative one as \(x\), for any \(x \in At\).⁴

The point of positive features is to indicate a need to govern something, and the point of negative features is to indicate a need to be governed by something. As an example, consider a wh-determiner like which. Which requires a NP complement, itself forms a DP, and requires being licensed as a wh-word. The syntactic features it has then can be represented as follows. It needs to select an NP, so it has a feature \(\bullet n\). It can be selected as a DP, so it has a feature \(d\). And it must be further licensed as a wh-word, so it has a feature \(w\). We would like words to satisfy their selectional properties before themselves being selected for, and so in its feature bundle the positive feature (\(\bullet n\)) must precede the others. General syntactic considerations dictate that it should be selected for before it moves to a wh-licensing position, and so in its feature bundle the negative feature \(d\) should precede the negative feature \(w\). Thus, the lexical entry for the word which could look as follows: \[\texttt{which} \mathrel{:} \bullet n\ d\ w\]

The distinction between positive and negative (or governer and governee) features mirrors a distinction between internal and external syntactic structure. The internal structure of a word is everything in its maximal projection. The external structure of a lexical item refers to where it’s maximal projection occurs within a larger structure. We first concentrate on the internal syntax of which. Its feature bundle indicates that it requires a noun phrase. A lexical item like mango is a NP in and of itself, and so it bears the (negative) feature \(n\). It neither requires any syntactic arguments, nor does it participate in any further syntactic dependencies. Thus the lexical entry for the word mango could look as follows: \[\texttt{mango} \mathrel{:} n\]

These lexical items (which and mango) ‘fit’ together in a syntactically relevant way:

The first unchecked feature in the feature bundles of both are the positive and negative version respectively of the same atomic feature.

In this situation, the two features cancel each other out (or can be checked). We can represent this by drawing a line between these two features.⁵

After doing this, we have linked together two lexical items. This creates something (a connected graph) which I will call a complex syntactic object. The complex syntactic object so connected has a head,

defined as the unique item in a connected graph without any checked negative features.⁶, ⁷

It may also have unchecked features, which may be in the feature bundles of different parts of this complex. In this case, there is only one feature bundle with unchecked features, and it is the bundle \(d\ w\) of which.

As there are no more unchecked positive features, we are done with the internal syntax of mango. As it is a DP, it must be selected by something which subcategorizes for a DP. As an example, we may choose the verb ripened. This is a tensed intransitive verb, which means that it selects a DP argument (\(\bullet d\)), and then may be selected as a tensed phrase (\(t\)). Its lexical entry might look as follows: \[\texttt{ripened}\mathrel{:}\bullet d\ t\]

This lexical item fits together with the syntactic complex which mango in the way described above (the first unchecked feature in the feature bundles of (the heads) of both are identical but of opposite polarity).⁸ We can connect the matching first unchecked features of both feature bundles with a line, creating a larger syntactic complex, the head of which is now ripened. This larger complex expression has however two feature bundles with unchecked features. One (attached to which) contains just the unchecked feature \(w\), and the other (on the head, ripened) contains just the unchecked feature \(t\).

In our semi-naïve background syntactic theory, we need a wh-licensor for the wh-feature on the wh-word. This is usually thought to reside in a head above tense. Adapting this idea into our formal theory, we must have a new lexical item which selects a tensed phrase \(\bullet t\), and then which licenses a wh-phrase (\(\bullet w\)). Bowing to tradition, we may call this head a complementizer (\(c\)). As it contributes nothing to the pronunciation of the sentence, we will call it (for now) ε. \[\epsilon \mathrel{:}\bullet t\ \bullet w\ c\]

Our rules for connecting expressions force us to first draw a line between the feature \(t\) in our complex expression ripened which mango and the \(\bullet t\) in the lexical item ε. This constructs a new complex expression containing all four lexical items, with two not-completely checked feature bundles: \(w\) and \(\bullet w\ c\).

For the first time we have a complex expression which contains two feature bundles with matching first features, thus satisfying in itself our criterion of ‘fitting together’. We thus draw a line connecting the postive and negative versions of the w feature. In the resulting expression, there is but a single feature which is unchecked, and it is the category feature of the head. We call such an expression complete.

Our description so far has not touched on things like word order (how do we pronounce a complex syntactic object), or derived syntactic structure (how can we represent the structure of a complex syntactic object in a familiar way). Instead, all we have done so far is to indicate which features have checked which others. I think that it is useful to present in this way because it disentangles logically disparate stuff: feature checking is at the core of our syntactic formalism, and this is logically distinct from what derived structure we decide to assign, or how we decide to linearize it. We will move on to these questions soon enough (and they will be easy to answer, now that we have this feature checking idea firmly in place). However, our rule for feature checking, “check the first features of matching feature bundles,” has some undetermined (or undesired?) corner cases that we will need to pin down. As it is stated, nothing stops us from taking two syntactic complexes, and checking features from feature bundles neither of which are the heads of the complexes. This is called parallel merge in the syntactic literature (Citko (2005)). I want to block this. Another (related) possibility is that one of the matching features is in a non-head feature bundle of one complex, while the other is in the head feature bundle of the other complex. This is called grafting in the syntactic literature (Van Riemsdijk (2006)), and I wish to block this as well. I will stipulate the following condition on feature checking:

The head of a syntactic complex must be involved in any feature checking relationship

If there are two syntactic complexes, then both their heads must be involved (i.e. must contain one of the pair of matching features). However, if there is but one syntactic complex (as there was in the last step of our previous derivation, where the \(w\) features were checked), the head must be involved, but this allows a non-head feature bundle to host the other feature. We will also reject the possibility of two features from the same bundle checking each other (if I were more poetic I would dub this a ban on incest).

Matching features must belong to distinct feature bundles

Another fringe case involves competition between features. As an example, if there had been multiple feature bundles in our previous example with a \(w\) feature, any one of them could have checked the \(\bullet w\) feature at the head. I will want to prohibit this.

A feature may check another only if there is no other accessible feature which could have checked it

This restriction also prohibits us from using a feature from a separate complex to check a feature that could have been checked from within the complex (in now outdated parlance, it enforces a move over merge constraint; see e.g. Shima (2000) and Wilder & Gärtner (1996/1997)). As an example, if we had a lexical item \(\texttt{why}\mathrel{:}w\) with just a \(w\) feature in its feature bundle, we could not have used this feature to check the \(\bullet w\) feature of our ripened which mango complex, because there is another accessible feature which could check it, namely the \(w\) feature in the bundle of which.

Homework

Please acquaint yourselves with this system. You can do this by coming up with lexical items (words with feature bundles), and seeing how they fit together. You can either continue with our English examples, creating lexical items suitable for use in sentences like

Mary ate every mango
Every mango ripened near John
John gave Mary every mango that Bill saw
Every mango seemed to ripen near John

Or come up with (simple) sentences in another language, either one you know or have read about.

Either way, please ensure that the constraints on feature checking are obeyed, and that each sentence corresponds to a complete expression.⁹

If your chosen analysis (i.e. choice of lexical items) does not allow your sentences to be derived under the constraints on feature checking given here, please note this, and try to come up with an alternative analysis that does. Do not feel pressured to impose the additional constraint on yourself that your analysis must look like the analyses you have learned previously.

A feature in this sense is just a property. According to the dictionary on my computer, it is “a distinctive attribute or aspect of something.” (This is sense 4 in the OED.) What counts as distinctive depends on one’s intent, but in this case it is an aspect of a lexical item relevant to the syntactic computation (as opposed to an aspect relevant to whether it rhymes with ‘snark’), or appears more than thrice in Chaucer’s Tales of Caunterbury. ↩︎
A syntactically relevant feature of a determiner is that it is a determiner. This feature will be reified into a d feature. It is sort of unfortunate that the technical term ‘feature’ is pronounced the same as the pretheoretic word ‘feature.’ ↩︎
A list is either empty, or it consists of a first element (the head of the list) and the rest of the list (its tail). ↩︎
That is, the negative form of an atomic feature looks just like it. ↩︎
+Hopefully to-be-surmounted technical difficulties currently prohibit me from actually drawing a line and showing you with a pretty picture.+ I really strongly recommend that you take out a piece of paper, and draw along with my verbal descriptions. ↩︎
We will want to require that every lexical item has at least one negative feature, interpreted as its category. ↩︎
Note that a lexical item is its own head, under this definition. ↩︎
Note crucially that ripened does not fit together in this way with the lexical item which by itself. ↩︎
You do not need to worry about structure, or word order. Just feature checking. ↩︎