Functional Programming and the Lambda Calculus

Our first lecture is about the lambda calculus, which many of you are familiar with from your previous semantics course.

lecture 1: the lambda calculus

What follows is more than is relevant for the lecture, which only discusses the material up to (but not including) the section Numbers.

The λ calculus

The λ calculus is a language for reasoning about functions.

Syntax

In the simplest version of the λ calculus a well-formed expression is called a (λ) term, and is defined by the following inductive definition:

var
a variable is a λ term
app
an application \((M N)\) is a λ term whenever both \(M\) and \(N\) are
abs
an abstraction \(\lambda x. M\) is a λ term whenever x is a variable and \(M\) is a term

The intuition is that an abstraction (a term of the form \(λ x.M\)) represents a function, and an application (a term of the form \((M N)\)) represents giving an argument (N) to a function (M).

The term \(\lambda x.M\) represents a function, which we in high school might have written like this: \(f(x) = M\). In contrast to high school, where function application is written with only the argument in parentheses, \(f(4)\), here both function and argument are within the parentheses: \((f\ 4)\) or rather \((\lambda x.M\ 4)\).

We adopt at the outset a number of abbreviatory conventions.

  1. abbreviating sequences of λs

    \(\lambda x_{1}.\lambda x_{2} \ldots \lambda x_{k}. M \leadsto \lambda x_{1},x_{2}\ldots,x_{k}.M \)

  2. leaving parentheses out of nested applications

    \(((M\ N)\ O) \leadsto M\ N\ O\)

As an example, the λ-term \(\lambda x. \lambda y.x\) is abbreviated (according to convention 1) as \(\lambda x,y.x\). This term expects two arguments, which should be given one after the other: \(((\lambda x,y.x\ N)\ O)\). The order of provided arguments is made explicit via the parenthesization, with the argument closest to the function the first one provided. Convention 2 allows us to abbreviate this as \(\lambda x,y.x\ N\ O\), i.e. without any parentheses. The parenthesization \((\lambda x,y.x\ (N\ O))\), which would be abbreviated as \(\lambda x,y.x\ (N\ O)\), applies the function \(\lambda x,y.x\) to the single argument \(N\ O\), which is the function \(N\) applied to its argument \(O\).

Free and Bound Variables

We are writing \(\lambda x.M\) to represent the function which takes an argument \(a\) and returns the result of substituting \(a\) for \(x\) in \(M\), which we write as \(M[x:=a]\). This raises the question of which variables we want to allow to be substituted for. For example, take \(\lambda x.M = \lambda x.\lambda x.x\). What should \(M[x:=N]\) be? One answer is that \(M[x:=N] = \lambda x.N\), the other is that \(M[x:=N] = \lambda x.x\). They correspond to asserting that the variable \(x\) is bound either by the furthest, or the closest, \(\lambda x\). The second option is what is chosen. This is for semantic reasons; it allows us to give an interpretation to a term \(M\) irrespective of the context in which it occurs. For example, \(M = \lambda x.x\) is the identity function, regardless of whether it occurs by itself, or as part of some larger term (such as \(\lambda x.M\)).

To make this precise, we introduce the notion of bound and free (occurrences of) variables. A bound variable (occurrence) is one that should be substituted by an argument, and a free variable (occurrence) is one that shouldn’t. We write \(FV(M)\) (resp. \(BV(M)\)) for the set of free (bound) variables in a term \(M\), which we define inductively as follows:

\(FV(x) = \{x\}\) \(BV(x) = \emptyset\)
\(FV(MN) = FV(M)\cup FV(N)\) \(BV(MN) = BV(M) \cup BV(N)\)
\(FV(\lambda x.M) = FV(M) - \{x\}\) \(BV(\lambda x.M) = BV(M)\cup \{x\}\)

We now can refine our intuitive description of substitution; \(M[x:=N]\) is the substitution of \(N\) for all free occurrences of \(x\) in \(M\). An extremely precise version of this can be given as per the below:

  • Definition (Substitution)
    • case 1: var

      • \(x[x:=N] = N\)
      • \(y[x:=N] = y\)
    • case 2: app

      \((MN)[x:=O] = (M[x:=O]N[x:=O])\)

    • case 3: abs

      • \((\lambda x.M)[x:=N] = \lambda x.M\)
      • \((\lambda y.M)[x:=N] = \lambda z.M[y:=z][x:=N]\), for \(z\notin FV(N) \cup FV(M)\)

of course, the additional difficulty is localized in case 3 (abstraction)! There are two subcases to consider. In the first, the variable we wish to substitute \(N\) for (\(x\)) is the variable bound by the λ-abstraction! In this case, there are no more free variables, and we can stop. In the second case, the variable we wish to substitute \(N\) for (\(x\)) is not the variable bound by the λ-abstraction. In this case, to be very careful, we change the name of the variable bound by abstraction to one which is not free either in the body of the term we are substituting within or in the term we are substituting. After this name change, we constinue with our substitution. This renaming step is necessary so as to ensure that we do not accidentally capture a free variable occuring within \(N\).

Self-test exercises
  1. Prove that the revised definition of substitution makes the following equalities hold:
    1. \((\lambda x.\lambda x.x)[x:=y] = \lambda x.\lambda x.x\)
    2. \((\lambda x.y)[y:=\lambda x.x] = \lambda z.\lambda x.x\)
  2. Describe in words what the intended interpretation of the λ-term \(\lambda x.\lambda y.x\) is. (What function does it represent?)

β-reduction

The intended interpretation of a λ-term \((MN)\) is as the application of the argument N to the function M, and of a λ-term \(\lambda x. M\) as a function of x. In order to formalise this we define the notion of β-reduction which relates terms which give an argument to a λ-abstract to the result of the application. We write \(M \rightarrow_{\beta} N\) to indicate that \(\langle M,N\rangle \in \beta\), where β is the relation of β-reduction.

  • Definition (β-reduction)

    \((\lambda x.M\ N) \mathrel{\rightarrow_\beta} M[x:= N]\)

A term of the form \(\lambda x.M\ N\) is called a β-redex, which is short for a β reducible expression.

Self-Test Exercise
  1. What is the result of applying \(\lambda x.\lambda y.x\) to the argument \(y\)? (I.e., what does \(((\lambda x.\lambda y.x)\ y)\) β-reduce to?) Is this intuitively correct? What do we want the result to be?

Extending relations

As stated, β-reduction only applies to a β-redex, i.e. a λ-term which is an application, whose first term is an abstraction. We would like, however, to have β-reduction to be able to apply to terms which contain β-redices, such as the following: \(((\lambda x,y.x\ N)
O)\). Here, the term is an application, whose first term is also an application. Thus, the term as a whole is not a β-redex. Its first term, however, is.

We can extend relations between λ-terms so as to relate terms that contain pieces which are related by the original relation. This is called the compatible closure of a relation \(R\), and we define it such:

  • Definition (Compatible closure)
    • R:: if \(\langle a,b\rangle \in R\), then \(a \rightarrow_R b\)
    • app1:: if \(M \rightarrow_R M’\), then \((MN)\rightarrow_R (M’N)\)
    • app2:: if \(N\rightarrow_R N’\), then \((MN)\rightarrow_R (MN’)\)
    • abs:: if \(M\rightarrow_R M’\), then \(\lambda x.M\rightarrow_R \lambda x.M’\)

We can call the relation \(\rightarrow_R\) so generated from any \(R\) a (one-step) reduction relation.

Sometimes we are interested in many-step reduction relations (\(\rightarrow_R^*\)), which we define as the reflexive transitive closure of the one-step relation as follows:

\(\rightarrow_R\)
if \(M\rightarrow_R N\) then \(M\rightarrow_R^* N\)
refl
for every \(M\), \(M\rightarrow_R^* M\)
trans
if \(M\rightarrow_R^* N\) and \(N\rightarrow_R^* O\) then \(M\rightarrow_R^* O\)

Other times we are interested in an equivalence relation between terms (\(\equiv_R\)), which is the transitive and symmetric closure of the many-step relation:

\(\rightarrow_R^*\)
if \(M\rightarrow_R^* N\) then \(M\equiv_R N\)
trans
if \(M\equiv_R N\) and \(N\equiv_R O\) then \(M\equiv_R O\)
sym
if \(M\equiv_R N\) then \(N\equiv_R M\)

Notions of reduction

We have encountered the relation β (relating a term of the form \((\lambda x.M\ N)\) to \(M[x:=N]\)), which allows us to say formally how applying a function to its argument is related to its result. There are many other properties that our syntactic characterization of functions have, which do not capture our intent. One is that there is no way for us to currently say that the terms \(\lambda x.x\) and \(\lambda y.y\) are related in any way. Clearly, however, they both represent the identity function: the function which simply returns the argument it is given. The α rule allows us to rename bound variables in a systematic way, which formalizes the idea that the names of bound variables are inessential.

  • Definition (α reduction)

    \(\lambda x.M \rightarrow_{\alpha} \lambda z.M[x:=z]\), provided that \(z\) does not occur in \(M\)

While β captures the idea of applying a function to an argument, α captures the idea that the names of bound variables don’t matter. Accordingly, we extend these relations in different ways.

  • We extend β to a (many-step) /reduction} relation, so as to investigate the process of (repeatedly) applying functions to their arguments.
  • We extend α to an equivalence relation (\(\equiv_\alpha\)), so as to equate terms that differ only in the names of their bound variables.

Sometimes a third relation η is assumed.

  • Definition (η reduction)

    \(\lambda x. (M\ x) \mathrel{\rightarrow_{\eta}} M\), provided that \(x\notin FV(M)\)

η is not always assumed (indeed, we will not assume it at the moment), because it has the consequence of making identity of λ-terms extensional.1 Note that, looking at β-equivalence, we have that \(\lambda x.(M\ x)\ N \equiv_\beta M\ N\) whenever \(x \notin FV(M)\). In other words, \(\lambda x.(M\ x)\) and \(M\) give the same results whenever we apply them to the same argument. However, without η, we cannot prove that the two terms are equivalent.

Substitution

Now that we are interested in terms only up to α-equivalence, we can adopt the variable convention to make life easier:

variable convention
always assume that bound variables are distinct from free variables in definitions, formulae, and proofs

This allows us to define substitution (\(M[x:=N]\)) in the following terms:

var1
\(x[x:=N] = N\)
var2
\(y[x:=N] = y\)
app
\((MN)[x:=O] = (M[x:=O]N[x:=O])\)
abs
\((\lambda y.M)[x:=N] = \lambda y.M[x:=N]\)

By the variable convention, we can assume both that \(y\not= x\) and that \(y \notin FV(N)\) in the abs case.2

Self-Test Exercise
  1. Prove that, indeed, \((\lambda x.Mx\ N) \rightarrow_\beta (MN)\) whenever \(x \notin FV(M)\).
  2. β-reduce the following as much as possible: \[((\lambda x.x(\lambda z.zy))\ (\lambda z.\lambda y.zy))\] Write explicitly each step you take, and remark whenever you rename bound variables.

A quick recap

We are using closed λ-terms (terms without free variables) to represent objects and functions over those objects. We began with truth values (\(\mathbb{T}\) and \(\mathbb{F}\)), which we represented as the terms \(\mathbb{T} = \lambda x,y.x\) and \(\mathbb{F} = \lambda x,y.y\). We also defined the conditional branching operator, “if then else” as the term \(\lambda m,n,b.bmn\). We defined three truth functions:

  • \(\textsf{and} = \lambda b,c.bc\mathbb{F}\)
  • \(\textsf{or} = \lambda b,c.b\mathbb{T}c\)
  • \(\textsf{not} = \lambda b.b\mathbb{F}\mathbb{T}\)

We are interested in the value that a term evaluates to; we define a value to be a normal form, which is any term which cannot be reduced further.

Self-Test Exercise
  1. Give λ-terms which represent the truth functions below

    \(\textsf{nand}\ \mathbb{T}\ \mathbb{T} = \mathbb{F}\) \(\textsf{xor}\ \mathbb{T}\ \mathbb{T} = \mathbb{F}\)
    \(\textsf{nand}\ \mathbb{T}\ \mathbb{F} = \mathbb{T}\) \(\textsf{xor}\ \mathbb{T}\ \mathbb{F} = \mathbb{T}\)
    \(\textsf{nand}\ \mathbb{F}\ \mathbb{T} = \mathbb{T}\) \(\textsf{xor}\ \mathbb{F}\ \mathbb{T} = \mathbb{T}\)
    \(\textsf{nand}\ \mathbb{F}\ \mathbb{F} = \mathbb{T}\) \(\textsf{xor}\ \mathbb{F}\ \mathbb{F} = \mathbb{F}\)

    (English paraphrases would be not both for nand and exactly one of for xor.)

  2. Prove that

    1. \(\textsf{not}(\textsf{not}\ p) = p\) for any \(p\)
    2. \(\textsf{and}\ (\textsf{or}\ p\ q)\ r = \textsf{or}
      (\textsf{and}\ r\ p)\ (\textsf{and}\ r\ q)\) for all \(p,q,r\)
    3. Can you think of a term \(M\) such that \(MM \rightarrow_\beta MM\) in one step? Does the term \(M\) have a normal form? What about \(\Omega = MM\)? Can you think of a different term that doesn’t have a normal form?

Numbers

We can represent the natural number \(n\) as the λ-term \(\ulcorner n\urcorner = \lambda s,z.s^n z\).3 The operation which tests a term to see whether it is equal to \(\ulcorner 0\urcorner\) is given by \(\textsf{isZero} := \lambda m.m\ (\lambda x.\mathbb{F})\ \mathbb{T}\). The function which adds one to any number \(n\) is given by \(\textsf{succ} := \lambda m,s,z.s\ (m\ s
z)\). Other functions we have defined thus far are:4

  • \(\textsf{plus} := \lambda m,n.m\ \textsf{succ}\ n\)
  • \(\textsf{mult} := \lambda m,n.m\ (\textsf{plus}\ n)\ \ulcorner 0\urcorner\)
Self-Test Exercise
  1. Give step-by-step β-reductions for

    1. \(\textsf{mult}\ \ulcorner 2\urcorner\ \ulcorner 0\urcorner\)
    2. \(\textsf{mult}\ \ulcorner 0\urcorner\ \ulcorner 2\urcorner\)
  2. Define a λ-term \(\textsf{exp}\) with the following behaviour: \[\textsf{exp}\ \ulcorner m\urcorner\ \ulcorner n\urcorner =\ulcorner m^n \urcorner\] Remember that \(m^0 = 1\).

    Hint: \(m^n = \underbrace{m * m * \ldots * m}_{n\ \mathit{times}}\)

  3. We want to represent the operation of truncated subtraction, which is defined as per the below: \[m \mathrel{\dot-} n := \left\{\begin{array}{ll} 0 & \mathrm{if}\ n > m \ m - n & \mathrm{otherwise}\end{array}\right.\] Define a term \(\textsf{sub}_{tr}\) which correctly represents truncated subtraction (i.e. \(\textsf{sub}_{tr}\ \ulcorner m\urcorner\ \ulcorner n\urcorner = \ulcorner m\mathrel{\dot-}n\urcorner\)). You may take for granted the existence of an operation \(\textsf{pred}\) which behaves in the following manner: \[\textsf{pred}\ \ulcorner m\urcorner = \ulcorner m\mathrel{\dot-} 1\urcorner\]

  4. Here is one way to define \(\textsf{pred}\): \[\textsf{pred} := \lambda m,s,z.m\ (\textbf{Q}_4s)\ (\textbf{K}z)\ \textbf{I}\] Where we define \(\textbf{Q}_4\), K, and I as per the following:5

    behaviour definition
    \(\textbf{Q}_4xyz = z(yx)\) \(\lambda x,y,z.z(yx)\)
    \(\textbf{K}xy = x\) \(\lambda x,y.x\)
    \(\textbf{I}x = x\) \(\lambda x.x\)

    This is very hard to understand.6 Do a step-by-step β-reduction of

    1. \(\textsf{pred}\ \ulcorner 0\urcorner\)
    2. \(\textsf{pred}\ \ulcorner 1\urcorner\)
    3. \(\textsf{pred}\ \ulcorner 2\urcorner\)

Pairing

A pair of elements is a data structure which contains them both, written \(\langle M,N\rangle\). The essence of a pair is that each element it contains is accessible. Therefore, we require two destructors, the first and second projection operators, which map a pair into its respective elements:

\begin{align} \pi_1\langle M,N\rangle & = M \\\
\pi_2\langle M,N\rangle & = N \end{align}

Given two elements, we construct a pair by means of the pairing constructor: \[\textsf{pair}\ M\ N = \langle M,N\rangle\]

We can represent the pair \(\langle M,N\rangle\) as the λ-term \(\lambda f.fMN\). We can implement the constructors and destructors above as the following λ-terms:

\begin{align} \pi_1 & := \lambda p.(p\ \mathbb{T}) \\\
\pi_2 & := \lambda p.(p\ \mathbb{F}) \\\
\textsf{pair} & := \lambda m,n,f.(f\ m\ n) \end{align}

Note that

\begin{align} \pi_1 (\textsf{pair}\ M\ N) & \rightarrow_\beta \textsf{pair}\ M\ N\ \mathbb{T} \\\
& \rightarrow_\beta^2 \lambda f.(f\ M\ N)\ \mathbb{T}\\\
& \rightarrow_\beta \mathbb{T}\ M\ N \\\
& \rightarrow_\beta^2 M\end{align}

and that

\begin{align} \pi_2 (\textsf{pair}\ M\ N) & \rightarrow_\beta \textsf{pair}\ M\ N\ \mathbb{F} \\\
& \rightarrow_\beta^2 \lambda f.(f\ M\ N)\ \mathbb{F}\\\
& \rightarrow_\beta \mathbb{F}\ M\ N \\\
& \rightarrow_\beta^2 N\end{align}

Using pairs, we can give an intuitive (!) definition of the predecessor function on the natural numbers: \[\textsf{pred} := \lambda n.(\pi_1\ (n\ (\lambda p.(\textsf{pair}\ (\pi_2 p)\ (\textsf{suc}\ (\pi_2 p))))\ (\textsf{pair}\ \ulcorner 0\urcorner\ \ulcorner 0\urcorner)))\] The ‘meat’ of this definition is the term \(\lambda p.(\textsf{pair}\ (\pi_2 p)\ (\textsf{suc}\ (\pi_2 p)))\). What this function does is take a pair of numbers \(\langle j,k\rangle\), throw away the first element, and then return the pair of the second element and its successor: \(\langle k, k+1\rangle\).

Lists

We define the set of all lists over a set \(A\) to be the smallest set \(\texttt{List}(A)\) such that:

  • the empty list, \(\textsf{nil}\), is in \(\texttt{List}(A)\)
  • if \(\ell \in \texttt{List}(A)\) and \(a \in A\), then \((\textsf{cons}\ a\ \ell) \in \texttt{List}(A)\)

We abbreviate the list \(\textsf{nil}\) as \([]\), and the list \(\textsf{cons}\ a\ \ell\) as \(a\hbox{:}\ell\). If \(a_1\hbox{:}(a_2\hbox{:}\ldots (a_n\hbox{:}\textsf{nil})\ldots )\) is a list, we abbreviate it as \([a_1,a_2,\ldots,a_n]\). (And so the list \(\textsf{cons}\ a\ \textsf{nil}\) is sometimes written \([a]\).)

There are two important `destructor’ functions on lists; the head of a non-empty list \(a\hbox{:}\ell\) is \(a\), and the tail of the same list is \(\ell\). Both are undefined on the empty list.

We can represent lists in the λ-calculus in the following way.

\begin{align} \ulcorner \textsf{nil}\urcorner & = \lambda c,n.n\\\
\ulcorner a\hbox{:}\ell\urcorner & = \lambda c,n.c\ a
(\ulcorner \ell\urcorner\ c\ n) \end{align}

In particular, the list \([1,2,3]\) is represented as the λ-term below: \[\lambda c,n.c\ 1\ (c\ 2\ (c\ 3\ n))\]

The function \(\textsf{length} : \texttt{List}(A) \rightarrow \mathbb{N}\) which maps each list to its length can be represented in the λ-calculus as follows: \[\textsf{len} := \lambda \ell.\ell
(\lambda x,y.\textsf{suc}\ y)\ \ulcorner 0\urcorner\] The operation \(\lambda x,y.(\texttt{suc}\ y)\) takes an element and a number, throws the element away, and increments the number. The length operation applies this term to each of its elements and their sister sublists - replacing each \(c\) in the above list by this function. The value \(\ulcorner 0\urcorner\) is the initial number value, which replaces the \(n\) in the above list. Thus, the function applies once per element in the list, each time adding one to the initial value of zero.

Self-Test Exercise
  1. Define terms \(\textsf{hd}\) and \(\textsf{tl}\) which represent the list functions \(\textsf{head}\) and \(\textsf{tail}\).

  2. Define the function \(\textsf{isNil}\) which maps the empty list to \(\mathbb{T}\) and every other list to \(\mathbb{F}\).

  3. Define the functions \(\textsf{sum}\), which adds up all of the numbers in a list (of numbers), and \(\textsf{forAll}\), which returns true just in case each member of the list (of booleans) is \(\mathbb{T}\).

    Hint: these can be defined very similarly to \(\textsf{len}\), except that you need to do something with the elements of the list.

  4. Define the function \(\textsf{append}\) which takes two lists \(\ell_1\) and \(\ell_2\), and `glues’ them together in that order. (If we view lists as sequences, \(\textsf{append}\) is just concatenation.) Your function should satisfy the following equations:

    \begin{align} \textsf{append}\ \textsf{nil}\ \ell_2 & = \ell_2 \\\
    \textsf{append}\ (a\hbox{:}\ell)\ \ell_2 & = a\hbox{:}(\textsf{append}\ \ell\ \ell_2) \end{align}

  5. Define the function \(\textsf{filter}\), which takes a test \(p : A\rightarrow \textbf{Bool}\) as an argument, and then removes all elements from a list that do not pass the test (i.e. it removes elements \(e\) such that \(p(e) = \mathbb{F}\)). Your function should satisfy the following equations:

    \begin{align} \textsf{filter}\ p\ \textsf{nil} & = \textsf{nil}\\\
    \textsf{filter}\ p\ a\hbox{:}\ell & = a\hbox{:}(\textsf{filter}
    p\ \ell) \text{, if $p(a) = \mathbb{T}$}\\\
    \textsf{filter}\ p\ a\hbox{:}\ell & = \textsf{filter}
    p\ \ell\text{, if $p(a) = \mathbb{F}$} \end{align}

Fixed points

Given a function \(F : A \rightarrow A\), a fixed point of \(F\) is any \(a \in A\) such that \(F(a) = a\). Note that, if \(F(a) = a\), \(a = F(F(a)) = F(F(F(a))) = \ldots = F^n(a)\) for any \(n\in \mathbb{N}\).

We can define a term \(\Theta\), called a fixed point combinator, which maps any term \(M\) to a fixed point for \(M\) (i.e., \(M
(\Theta\ M) = \Theta\ M\)), as follows: \[\Theta := (\lambda x,y.y
(x\ x\ y))\ (\lambda x,y.y\ (x\ x\ y)\] Defining \(\theta := \lambda x,y.y\ (x\ x\ y)\), we can write \(\Theta\) as the more managable \(\Theta = \theta\theta\).

We can in fact arrive at this term in a principled way. First note that if we have a recursive equation of the form \(\textsf{f} = N\ \textsf{f}\), for \(\textsf{f}\) a variable (representing the name of the operation we are trying to define) and \(N\) any λ-term, we can define this internally to the λ-calculus as the term \(\textsf{f} := f’\ f’\), where \(f’ := \lambda f.(N\ (f\ f))\). Note that \(\textsf{f} = f’\ f’ = \lambda f.(N\ (f\ f))\ f’ \rightarrow_\beta N\ (f’\ f’) = N\ \textsf{f}\), which is the original recursive equation we wanted \textsf{f} to satisfy.

Now, we have the recursive equation \(\Theta\ M = M\ (\Theta\ M)\). We can abstract over the argument \(M\) to arrive at \(\Theta = \lambda y.y\ (\Theta\ y)\). Using the strategy above, we identify \(N = \lambda t,y.y\ (t\ y)\), and rewrite the equation as \(\Theta = (\lambda t,y.y\ (t\ y))\ \Theta\). Now we define \(\Theta := \Theta’\ \Theta’\), where \(\Theta’ := \lambda x.\ (N\ (x\ x)) = \lambda x.\ ((\lambda t,y.y\ (t\ y))\ (x\ x)) \rightarrow_\beta \lambda x,y.y\ (x\ x\ y)\), which is exactly the term \(\theta\) that we defined above.\footnote{We can perform this technique on any recursive equation; consider \(\Omega = \Omega\). Here we have \(N = \lambda y.y\), and so we can rewrite the equation as \(\Omega = (\lambda y.y)\ \Omega\), and we can set \(\Omega := \omega\omega\), where \(\omega := \lambda x.x\ x \equiv_\beta \lambda x.(\lambda y.y)\ (x\ x)\). Note that \(\Omega\), as defined, is the `typical’ term with no normal form.}

Note that \(\Theta\), and fixed point combinators (i.e. recursion) in general, allow us to easily write λ-terms which do not have a normal form; consider \(\Theta\ (\lambda x.x)\), which β-reduces to itself in two steps. This contrasts with our previous way of defining functions on numbers and lists which were guaranteed to have a normal form as long as the arguments we passed to our numbers and lists did.

Partial Recursive Functions

In the (nineteen)thirties mathematicians were making great progress at nailing down the intuitive notion of an algorithm, or effectivly calculable procedure. This culminated in several (very different looking) definitions (Turing Machines, λ-Calculus, Partial Recursive Functions, Register Machines, Cellular Automata, etc), all of which can be shown to coincide on the functions they can define over natural numbers (as which most things of interest can be encoded). Perhaps the easiest to see is the fact that the λ-calculus can define all of the partial recursive functions, as defined originally by Kurt Gödel.

Intuitively, the partial recursive functions contain a few very simple, `obviously’ effectively computable functions, and are closed under certain ways of combining functions which seem intuitively to preserve effective calculability.

The basic functions are:

0:
\(\textbf{zero}(n) = 0\), for all \(n \in \mathbb{N}\),
successor:
\(\textbf{succ}(n) = n + 1\), for all \(n\in\mathbb{N}\),
projections:
\(\textbf{proj}^n_i(x_1,\ldots,x_n) = x_i\), for all \(i \le n\), and \(x_1,\ldots,x_n \in \mathbb{N}\)

In addition, we define three additional /operators} under which the set of partial recursive functions are to be closed.

  • The generalized composition operator `\(\circledcirc\)’ of functions \(g : \mathbb{N}^k \rightarrow \mathbb{N}\) and \(f_1,\ldots,f_k : \mathbb{N}^n \rightarrow \mathbb{N}\), is defined to be the function \(g \circledcirc \langle f_1,\ldots,f_k\rangle : \mathbb{N}^n \rightarrow \mathbb{N}\), such that for every \(x_1,\ldots,x_n \in \mathbb{N}\), \[g\circledcirc \langle f_1,\ldots,f_k\rangle (x_1,\ldots,x_n) = g(f_1(x_1,\ldots,x_n),\ldots,f_k(x_1,\ldots,x_n))\]

  • The primitive recursion operator `\(\rho\)', given functions \(f : \mathbb{N}^k \rightarrow \mathbb{N}\) and \(g : \mathbb{N}^{k+2} \rightarrow \mathbb{N}\), is defined to be \(\rho(f,g) : \mathbb{N}^{k+1} \rightarrow \mathbb{N}\) where, for \(n,x_1,\ldots,x_k \in \mathbb{N}\),

    \(\rho(f,g)(0,x_1,\ldots,x_k) = f(x_1,\ldots,x_k)\)
    \(\rho(f,g)(n+1,x_1,\ldots,x_k) =g(n,x_1,\ldots,x_k,\rho(f,g)(n,x_1,\ldots,x_k))\)

    The primitive recursion operator looks a little puzzling at first, but all it does is ‘count down’ its numerical argument, and apply \(g\) that many times to \(f\): \[\rho(f,g)(3,x,y) = g(2,x,y,g(1,x,y,g(0,x,y,f(x,y))))\]

  • The minimization operator `\(\mu\)’ is, given \(f : \mathbb{N}^{k+1} \rightarrow \mathbb{N}\), the function \(\mu(f) : \mathbb{N}^{k}\rightarrow \mathbb{N}\) where for \(x_1,\ldots,x_k\in\mathbb{N}\),7

    \(\mu(f)(x_1,\ldots,x_k) = \) the smallest number \(n\) such that \(f(n,x_1,\ldots,x_k) = 0\), if one exists. Otherwise, it is undefined.

    The smallest class of functions containing the basic functions above and closed under ⊚ and ρ is called the set of primitive recursive functions. All primitive recursive functions are defined on all of their arguments. Closing the basic functions under μ in addition to ⊚ and ρ yields the class of partial recursive functions, some of which are, as the name of their class suggests, not defined on all arguments in their domain.8 This shows that the partial recursive functions properly include the primitive recursive ones. However, it is not the case that the difference between the two classes lies solely in the fact that the former can define partial functions. Having the minimization operator allows even new total functions to be defined, such as the famous Ackermann function.

Self-Test Exercise
  1. Show that all of the partial recursive functions can be defined in the λ-calculus. Can you define \(\rho\) without the \(\Theta\) combinator?
  2. Define the minimization operator as a fixed point (i.e. using \(\Theta\)).
  3. Define \(\textsf{append}\), \(\textsf{filter}\), and \(\textsf{exp}\) as fixed points (i.e. using \(\Theta\)).
  4. Define the modulus operation (\(\textsf{mod}\ m\ n\) is the remainder of \(m/n\)).

Metatheory

We have spent some time now using the λ-calculus to define objects and functions over those objects. In so doing, we have seen that β-reduction is a notion of computation, and a normal form is the result of a computation. Some important, foundational, questions arise at this point.

  1. Does every term have a normal form?
  2. Are normal forms unique?
  3. How can we find normal forms?

The answer to question 1 is No: the term \(\Omega = (\lambda x.x\ x ) (\lambda x.x\ x )\) has no normal form, as it β-reduces to itself in one step. The answer to question 2 is Yes; this is a consequence of the Confluence (or Church-Rosser) property of the relation \(\rightarrow_\beta^*\). There is an answer to question 3, one of which is given by the leftmost reduction theorem.

Church Rosser Theorem
\(\rightarrow_\beta^*\) is confluent. In other words, for any \(M,N_1,N_2\) such that \(M\rightarrow_\beta^* N_1\) and \(M\rightarrow_\beta^* N_2\), there is some \(N\) such that \(N_1 \rightarrow_\beta^* N\) and \(N_2\rightarrow_\beta^* N\).

A direct corollary of the above is:

corollary
Any term \(M\) has at most one normal form.
proof
Let \(M\rightarrow_\beta^* N_1\) and \(M\rightarrow_\beta^* N_2\), and let \(N_1, N_2\) be normal forms. Then, by the Church Rosser Theorem, there is some \(N\) such that \(N_1\rightarrow_\beta^* N\) and \(N_2\rightarrow_\beta^*\). As \(N_1\) is a normal form, \(N_1 = N\). As \(N_2\) is a normal form, \(N_2 = N\). By transitivity, \(N_1 = N_2\).
Leftmost Reduction Theorem
If \(M\rightarrow_\beta^* N\) and \(N\) is a normal form, then \(N\) can be obtained from \(M\) by β-reduction steps which always reduce the leftmost β-redex.

  1. This means that \(M \equiv N\) if for all \(x\) not in \(M\) or \(N\), \(M\ x \equiv N\ x\). Thus, with η, we can no longer distinguish between different ways of computing the same functional relation. ↩︎

  2. Note that this definition of substitution only works if we remember to rename bound variables appropriately before each β-reduction step. ↩︎

  3. We set \((M^0\ N) = N\), and \((M^{n+1} N) = M(M^{n+1} N)\). ↩︎

  4. Alternative formulations are, for \(\textsf{plus}\), \(\lambda m,n,s,z.m\ s\ (n\ s\ z)\), and for \(\textsf{mult}\), \(\lambda m,n,s,z.m\ (n\ s)\ z\). The ones in the main text are nicer pedagogically because they show new functions being defined in terms of old ones. ↩︎

  5. The name \(\textbf{Q}_4\) comes from Chris Rathman’s Combinator Birds page, which apparently collects the combinators discussed in Raymond Smullyan‘s To Mock a Mockingbird. ↩︎

  6. The inventor of the λ-calculus, Alonzo Church, had originally thought that this operation couldn’t be defined. His student Stephen Kleene apparently thought of this definition while at the dentist. ↩︎

  7. The symbol \(\uparrow\) indicates that the result is undefined. ↩︎

  8. An easy example is \(\mu(\textbf{succ})\), which is undefined (there is no \(n\) such that \(\textbf{succ}(n) = 0\)). ↩︎