Colloquium

The philosophy of MDL provides a rigorous method to choose between competing theories of some data: choose the simplest theory that describes the data in the simplest way. ‘Crude’ MDL uses a two-part coding scheme to encode data and hypothesis: the hypothesis is encoded the data is encoded by means of the hypothesis In order to carry this out, we need two families of encoding schemes: a scheme \(C_{1}\) that tells us how to encode hypotheses a scheme \(C_{2,H}\), for each hypothesis \(H\), that tells us how to encode data given hypothesis \(H\) Then the best hypothesis given some data \(D\) is one which mininimizes the sum of \(C_{1}(H) + C_{2,H}(D)\).

Statistical Preliminaries probabilistic model remember that a model is a set of hypotheses; a model is the wiggle room that we have for defining a particular theory. So a probabilistic model is a set of probabilistic hypotheses. If we think of a theory as a way of generating predictions, a probabilistic hypothesis/theory is a probabilistic source of predictions. Often, the wiggle room that a model provides is structured in a particular way.

Preliminaries Codes a code for a set of objects \(\mathcal{X}\) assigns each object \(x \in \mathcal{X}\) a unique binary string of length at least one Given a code \(C\), \({C}(x)\) refers to the binary string encoding the object \(x\) As MDL is concerned with the length of things, it is helpful to have a concise notation for that: \(L_{{C}}(x)\) refers to the length of the binary string coding \(x\). In symbols: \(L_{{C}}(x) = \left| {C}(x)\right|\).

The basic idea behind MDL is summarized below. MDL: The basic idea The goal of statistical inference may be cast as trying to find regularity in the data. ‘Regularity’ may be identified with ‘ability to compress’. MDL combines these two insights by viewing learning as data compression: it tells us that, for a given set of hypotheses H and data set D, we should try to find the hypothesis or combination of hypotheses in H that compresses D most.

I will walk through the text here, commenting on things that I think could use being expanded upon. Our judgements in these regards will surely differ! How does one decide among competing explanations of data given limited observations? This is the problem of model selection. It stands out as one of the most important problems of inductive and statistical inference. The Minimum Description Length (MDL) Principle is a relatively recent method for inductive inference that provides a generic solution to the model selection problem.

Grünwald 2004 (part 5)

Grünwald 2004 (part 4)

Grünwald 2004 (part 3)

Grünwald 2004 (part 2)

Grünwald 2004 (part 1)