Page 185

[Cover] [Contents] [Index]

Page 185

The present section begins to deal with these issues by extending our paradigms to model multiple sources of data, some of which are inaccurate. For this purpose scientists will be conceived as receiving a finite number of texts, announcing hypotheses after examining some initial segment of each of them. To keep the present discussion manageable, we confine ourselves to the identification of functions.

To begin, we need to enrich our definition of scientist.

8.40 Definition Let . A scientist with k streams is a computable mapping from SEGI^k into N.

8.41 Definition Let . Fix a k stream scientist M, texts G₁,G₂, . . . ,G_k, and . We say M converges on G₁,G₂, . . . ,G_k to i (written: ) just in case there exists an n such that M(G₁[n₁],G₂[n₂], . . . , G_k[n_k]) = i whenever n₁, . . . , n_k > n.

Next, we define the identification of functions from multiple texts, some of which could be inaccurate.

8.42 Definition Let j with . Let a, .

(a) M f (written: ) just in case for any collection of k texts, G₁, G₂, . . . , G_k, at least j of which are a-noisy for f, and j _M(G1,G₂, . . . ,G_k) =^b f.

(b) .

(c) and are defined similarly to .

8.43 Definition When discussing identification criteria involving multiple data sources, those texts for which the number of inaccuracies is within the bound required by the criteria will be referred to as good texts.

The next three propositions describe situations in which learning from multiple inaccurate texts is no more difficult than learning from a single inaccurate text.

8.44 Proposition Let j and be such that and let . Then:

(a) .

(b) .

(c) .

[Cover] [Contents] [Index]