Week 3 [2023-04-19 Wed]

We discussed language modeling.

A language model is a probability distribution over a language. A (categorical) language model (aka a grammar) can be thought of as describing a set of strings; this models the grammatical/ungrammatical distinction. A (probabilistic) language model extends this so as to assign each string a weight (aka a probability). This weight can be used to predict more kinds of performance phenomena than can a categorical model.

N-grams are a very simple, collocation based, approach to language modeling. You can read up on them here.

For next time, we will discuss using surprisal to model linguistic data.