λ Preliminaries

2020-10-31

The λ calculus is a language for reasoning about functions.

It defines a syntax for writing down functions, as well as relations between λ-terms stating which terms represent the same function.

Syntax

In the simplest version of the λ calculus a well-formed expression is called a (λ) term, and is defined by the following inductive definition:

var: a variable is a λ term
app: an application \((M N)\) is a λ term whenever both \(M\) and \(N\) are
abs: an abstraction \(\lambda x. M\) is a λ term whenever x is a variable and \(M\) is a term

The intuition is that an abstraction (a term of the form \(λ x.M\)) represents a function, and an application (a term of the form \((M N)\)) represents giving an argument (N) to a function (M).

The term \(\lambda x.M\) represents a function, which we in high school might have written like this: \(f(x) = M\). In contrast to high school, where function application is written with only the argument in parentheses, \(f(4)\), here both function and argument are within the parentheses: \((f\ 4)\) or rather \((\lambda x.M\ 4)\).

We adopt at the outset a number of abbreviatory conventions.

abbreviating sequences of λs

\(\lambda x_{1}.\lambda x_{2} \ldots \lambda x_{k}. M \leadsto \lambda x_{1},x_{2}\ldots,x_{k}.M \)
leaving parentheses out of nested applications

\(((M\ N)\ O) \leadsto M\ N\ O\)

As an example, the λ-term \(\lambda x. \lambda y.x\) is abbreviated (according to convention 1) as \(\lambda x,y.x\). This term expects two arguments, which should be given one after the other: \(((\lambda x,y.x\ N)\ O)\). The order of provided arguments is made explicit via the parenthesization, with the argument closest to the function the first one provided. Convention 2 allows us to abbreviate this as \(\lambda x,y.x\ N\ O\), i.e. without any parentheses. The parenthesization \((\lambda x,y.x\ (N\ O))\), which would be abbreviated as \(\lambda x,y.x\ (N\ O)\), applies the function \(\lambda x,y.x\) to the single argument \(N\ O\), which is the function \(N\) applied to its argument \(O\).

β-rule

The intended interpretation of a λ-term \((MN)\) is as the application of the argument N to the function M, and of a λ-term \(\lambda x. M\) as a function of x. In order to formalise this we define the β-rule which relates terms which give an argument to a λ-abstract to the result of the application. A term of the form \(\lambda x.M\ N\) is called a β-redex, which is short for a β reducible expression. We write \(M \rightarrow_{\beta} N\) to indicate that \(\langle M,N\rangle \in \beta\), where β is the β-rule.

Definition (β-rule)

\((\lambda x.M\ N) \mathrel{\rightarrow_\beta} M[x:= N]\)

In the definition, the notation \(M[x:=N]\) means “M where you have replaced the \(x\)s bound by the λ-abstraction with N". Recall how we use definitions like \(f(x) = x^{2} + 2x + 1\); when we wish to evaluate the function \(f\) at an argument \(3\), we replace the \(x\)s on the right hand side with this argument: \[f(3) = (x^{2} + 2x + 1)[x:=3] = 3^{2} + 2\cdot 3 + 1 = 16\]

Free and Bound Variables

We are writing \(\lambda x.M\) to represent the function which takes an argument \(a\) and returns the result of substituting \(a\) for \(x\) in \(M\), which we write as \(M[x:=a]\). This raises the question of which variables we want to allow to be substituted for. For example, take \(\lambda x.M = \lambda x.\lambda x.x\). What should \(M[x:=N]\) be? One answer is that \(M[x:=N] = \lambda x.N\), the other is that \(M[x:=N] = \lambda x.x\). They correspond to asserting that the variable \(x\) is bound either by the furthest, or the closest, \(\lambda x\). The second option is what is chosen. This is for semantic reasons; it allows us to give an interpretation to a term \(M\) irrespective of the context in which it occurs. For example, \(M = \lambda x.x\) is the identity function, regardless of whether it occurs by itself, or as part of some larger term (such as \(\lambda x.M\)).

To make this precise, we introduce the notion of bound and free (occurrences of) variables. A bound variable (occurrence) is one that should be substituted by an argument, and a free variable (occurrence) is one that shouldn’t. We write \(FV(M)\) (resp. \(BV(M)\)) for the set of free (resp. bound) variables in a term \(M\), which we define inductively as follows:

Definition (Free and Bound variables)

\(FV(x) = \{x\}\) \(BV(x) = \emptyset\)

\(FV(MN) = FV(M)\cup FV(N)\) \(BV(MN) = BV(M) \cup BV(N)\)

\(FV(\lambda x.M) = FV(M) - \{x\}\) \(BV(\lambda x.M) = BV(M)\cup \{x\}\)


\(FV(x) = \{x\}\)		\(BV(x) = \emptyset\)
\(FV(MN) = FV(M)\cup FV(N)\)		\(BV(MN) = BV(M) \cup BV(N)\)
\(FV(\lambda x.M) = FV(M) - \{x\}\)		\(BV(\lambda x.M) = BV(M)\cup \{x\}\)

Note that the definitions of \(FV\) and of \(BV\) adhere closely to the definitions of the syntax of λ terms! There are three possibilities to be a λ term (you are either a variable, an application, or an abstraction), and for each of these possibilities, there is a specific way of computing its free/bound variables.

We now can refine our intuitive description of substitution; \(M[x:=N]\) is the substitution of \(N\) for all free occurrences of \(x\) in \(M\). An extremely precise version of this can be given as per the below:

Definition (Substitution)

case 1: var

\(x[x:=N] = N\)

\(y[x:=N] = y\)

case 2: app

\((MN)[x:=O] = (M[x:=O]N[x:=O])\)

case 3: abs

\((\lambda x.M)[x:=N] = \lambda x.M\)

\((\lambda y.M)[x:=N] = \lambda z.M[y:=z][x:=N]\), for \(z\notin FV(N) \cup FV(M)\)

Again, this definition closely mirrors the syntax of λ terms. For variables, if you are substituting all free occurrences of \(x\) in a variable with \(N\), then if the variable is \(x\) (subcase 1), you replace it with \(N\), and if not (subcase 2), you leave it alone. If you are applying one term to another, and want to replace all free \(x\)s with \(N\), you just replace all free \(x\)s with \(N\) in both terms. Finally, if you are looking at an abstraction, there are two cases to consider. In the first, the variable we wish to substitute \(N\) for (\(x\)) is the variable bound by the λ-abstraction! In this case, there are no more free variables, and we can stop. In the second case, the variable we wish to substitute \(N\) for (\(x\)) is not the variable bound by the λ-abstraction. In this case, to be very careful, we change the name of the variable bound by abstraction to one which is not free either in the body of the term we are substituting within or in the term we are substituting. After this name change, we constinue with our substitution. This renaming step is necessary so as to ensure that we do not accidentally capture a free variable occuring within \(N\).

Notions of reduction

We have encountered the relation β (relating a term of the form \((\lambda x.M\ N)\) to \(M[x:=N]\)), which allows us to say formally how applying a function to its argument is related to its result. There are many other properties that our syntactic characterization of functions have, which do not capture our intent. One is that there is no way for us to currently say that the terms \(\lambda x.x\) and \(\lambda y.y\) are related in any way. Clearly, however, they both represent the identity function: the function which simply returns the argument it is given. The α rule allows us to rename bound variables in a systematic way, which formalizes the idea that the names of bound variables are inessential.

Definition (α-rule)

\(\lambda x.M \rightarrow_{\alpha} \lambda z.M[x:=z]\), provided that \(z\) does not occur in \(M\)

While β captures the idea of applying a function to an argument, α captures the idea that the names of bound variables don’t matter.

Sometimes a third relation η is assumed.

Definition (η-rule)

\(\lambda x. (M\ x) \mathrel{\rightarrow_{\eta}} M\), provided that \(x\notin FV(M)\)

η is not always assumed because it has the consequence of making identity of λ-terms extensional.¹

Extending relations

As stated, the β-rule only applies to a β-redex, i.e. a λ-term which is an application, whose first term is an abstraction. (Similarly, α- and η-rules only apply to α- or η- redices.) We would like, however, to have the β-rule to be able to apply to terms which contain β-redices, such as the following: \(((\lambda x,y.x\ N)
O)\). Here, the term is an application, whose first term is also an application. Thus, the term as a whole is not a β-redex. Its first term, however, is.

We can extend relations between λ-terms so as to relate terms that contain pieces which are related by the original relation. This is called the compatible closure of a relation \(R\), and we define it such:

Definition (Compatible closure)

R:: if \(\langle a,b\rangle \in R\), then \(a \rightarrow_R b\)

app1:: if \(M \rightarrow_R M’\), then \((MN)\rightarrow_R (M’N)\)

app2:: if \(N\rightarrow_R N’\), then \((MN)\rightarrow_R (MN’)\)

abs:: if \(M\rightarrow_R M’\), then \(\lambda x.M\rightarrow_R \lambda x.M’\)

We can call the relation \(\rightarrow_R\) so generated from any \(R\) a (one-step) reduction relation. We can see that I anticipated this development in the definitions of the α, β, and η rules, where I wrote already \(\rightarrow_{\alpha/\beta/\eta}\).

Given any relation \(R\), there are two fundamental questions we might ask about its relata:

can we get from \(a\) to \(b\) just by moving to \(R\)-related elements? This question views \(R\) as a transforming device, which only works in one direction.
are two elements connected to each other via \(R\)? This question views \(R\) non-directionally.

The first question is formalized as the reflexive transitive closure of a relation.

Definition (Reflexive Transitive closure)

R: if \(a R b\) then \(a R^{*} b\)

refl: for every \(a\), \(a R^* a\)

trans: if \(aR^*b\) and \(bR^*c\) then \(aR^*c\)

The reflexive transitive closure of a reduction relation holds between terms \(M\) and \(N\) if \(M\) can be reduced to \(N\) in some finite number of reduction steps.

The second question is formalized as the reflexive, symmetric, transitive closure of a relation.

Definition (Reflexive Symmetric Transitive closure)

R: if \(a R b\) then \(a \equiv_{R} b\)

refl: for every \(a\), \(a \equiv_{R}a\)

sym: if \(a\equiv_{R} b\) then \(b\equiv_R a\)

trans: if \(a\equiv_{R}b\) and \(b\equiv_{R}c\) then \(a\equiv_{R}c\)

A reflexive, symmetric and transitive relation is called an equivalence relation. We are primarily interested in the equivalences between terms generated by α-, β- and η-reduction. These equivalence classes represent the underlying functions we are interested in, abstracting away from the syntactic accidents of the language we are using to represent them.

Given a λ-term \(M\), we write \([M]_{\equiv}\) for the set of all terms (α-, β-, and η-)equivalent to \(M\).

Substitution

Now that we are interested in terms only up to α-equivalence, we can adopt the variable convention to make life easier:

variable convention: always assume that bound variables are distinct from free variables in definitions, formulae, and proofs

This allows us to define substitution (\(M[x:=N]\)) in the following terms:

var1: \(x[x:=N] = N\)
var2: \(y[x:=N] = y\)
app: \((MN)[x:=O] = (M[x:=O]N[x:=O])\)
abs: \((\lambda y.M)[x:=N] = \lambda y.M[x:=N]\)

By the variable convention, we can assume both that \(y\not= x\) and that \(y \notin FV(N)\) in the abs case.²

Typing

The intended referents of our λ-terms are functions with well-behaved domains and ranges. In other words, the functions we intend to consider all take a particular number of arguments, of a particular kind, and return a particular kind of result. For example, a transitive verb will denote a function which takes two arguments and which returns a truth value. Moreover these two arguments will be entities (and not, say, other verbs). Of such functions, we say that they have type \(e\rightarrow e \rightarrow t\), where \(e\) is the type of entities, and \(t\) the type of truth values.

We will develop a type assignment system which will allow us to assign types to λ-terms. A (type assignment) statement of the form \(M : A\), where \(M\) is a term and \(A\) a type is read as ‘\(M\) has type \(A\).’ While in linguistic semantics we often have the types \(e\) and \(t\), we sometimes see types like \(d\) (for degrees), or \(v\) (for events), and so on. The actual types we use for a particular setting depend on that particular setting, and so we would like to abstract away from any particular set of types. Given an arbitrary set \(\mathbb{A}\) (of basic types), the set of all types over \(\mathbb{A}\) is defined as the closure of \(\mathbb{A}\) under the binary type forming operator \(\rightarrow\):

Definition (simple types)

any basic type is a simple type

if α and β are simple types, so is (α → β)

A λ term of the form \(x\) contains just a free variable, and, depending on how we interpret this variable, could represent just about anything. Accordingly, we will ask what type assignment statements are true given a particular type assignment to the variables; this will be called a typing judgment. We will write typing judgments as \(\Gamma \vdash M : A\), for \(\Gamma\) a function assigning types to variables, and \(M : A\) a type assignment statement, to indicate that \(M\) has type \(A\) in the context of \(\Gamma\).

Definition (Axiom)

\(\Gamma \vdash x : A\) iff \(\Gamma(x) = A\)
A variable has a type \(A\) in a context which assigns to it the type \(A\).

Definition (→-elimination)

if \(\Gamma \vdash M : A\rightarrow B\) and \(\Gamma \vdash N : A\) then \(\Gamma \vdash (M\ N) : B\)
An application has type \(B\) if its functor has type \(A\rightarrow B\) and its argument has type \(A\).

Definition (→-introduction)

if \(\Gamma, x : A \vdash M : B\) then \(\Gamma \vdash \lambda x.M : A \rightarrow B\)
An abstraction has type \(A\rightarrow B\) if, assuming that the variable it abstracts over has type \(A\), its body has type \(B\).

For \(\Gamma\) a variable type assignment, \(x\) a variable, and \(A\) a type, \(\Gamma, x : A\) is the variable type assignment like \(\Gamma\) except that it assigns to \(x\) the type \(A\).

These three rules should sound intuitively right. We can use them to determine whether a term \(M\) can have a particular type \(A\).

It is sometimes convenient to have typing information in the term itself. This way, we write \(\lambda x^{A}.M\) instead of just \(\lambda x.M\). The \(\rightarrow\)-introduction rule then becomes: \[\text{if } \Gamma, x : A \vdash M : B \text{ then } \Gamma \vdash \lambda x^{A}.M : A \rightarrow B\]

Semantics

We have spent a lot of time developing our syntactic perspective on functions, but we have not said anything systematic about which functions our λ terms actually represent!

Given a set \(A\) of basic types, we require a set of denotation domains for each basic type, which we write \((D_{a})_{a\in A}\). Each \(D_{a}\) is the denotation domain for the type \(a\).

Given denotation domains for the basic types, we define denotation domains for the implicational types via exponentiation (i.e. function spaces): \[D_{A\rightarrow B} = [D_{A} \rightarrow D_{B}]\] Here \([X\rightarrow Y]\) is the set of all functions from input set \(X\) to output set \(Y\).

Denotation domains defined in this way (i.e. where the denotation domain of an arrow type is the set of all functions from the denotation domain of the input type to the denotation domain of the output type) are called full type structures.

λ-terms may contain free variables. We can only interpret these by stipulation. Accordingly, any meaningful term will denote a function from variable assignments to the object we would normally think of them as denoting. A variable assignment \(\rho\) is a function from variables to objects in the available denotation domains. We say that a variable assignment \(\rho\) conforms to variable type assignment \(\Gamma\) just in case for each variable \(x\), \(\rho(x)\in D_{\Gamma(x)}\). That is, \(\rho\) interprets each variable as an object of the type specified by \(\Gamma\).

Formally we will assign meanings to typing judgments \(\Gamma \vdash M : A\); these will be functions from variable assignments to elements of the domain. Crucially, they will be defined only on assignments that conform to \(\Gamma\).

Definition (Axiom Semantics)

\(\left[\left[\Gamma \vdash x : A\right]\right](\rho) = \rho(x)\), for \(\rho\) any variable assignment conforming to \(\Gamma\)

Definition (→-Elimination Semantics)

\(\left[\left[\Gamma \vdash (M\ N) : B\right]\right](\rho) = \left[\left[\Gamma \vdash M : A \rightarrow B\right]\right](\rho)(\left[\left[\Gamma \vdash N : A\right]\right](\rho))\)

An application is interpreted as the result of applying the interpretation of its functor to the interpretation of its argument.

Definition (→-Introduction Semantics)

for any object \(a \in D_{A}\), \(\left[\left[\Gamma \vdash \lambda x.M : A \rightarrow B\right]\right](\rho)(a) = \left[\left[\Gamma, x: A \vdash M : B\right]\right](\rho[x:= a])\)

An abstraction is interpreted as a function which on any input, returns the value of its body when the variable abstracted over is interpreted as this object.

As you might expect, the variable assignment only influences the interpretation of free variables in a term. Therefore, closed terms, i.e. those without free variables, denote the same object irrespective of the variable assignment.

This means that \(M \equiv N\) if for all \(x\) not in \(M\) or \(N\), \(M\ x \equiv N\ x\). Thus, with η, we can no longer distinguish between different ways of computing the same functional relation. ↩︎
Note that this definition of substitution only works if we remember to rename bound variables appropriately before each β-reduction step. ↩︎