[Cover] [Contents] [Index] Previous page Next Section

Page 185
The present section begins to deal with these issues by extending our paradigms to model multiple sources of data, some of which are inaccurate. For this purpose scientists will be conceived as receiving a finite number of texts, announcing hypotheses after examining some initial segment of each of them. To keep the present discussion manageable, we confine ourselves to the identification of functions.
To begin, we need to enrich our definition of scientist.
8.40 Definition Let 0185-001.gif. A scientist with k streams is a computable mapping from SEGIk into N.
8.41 Definition Let 0185-002.gif. Fix a k stream scientist M, texts G1,G2, . . . ,Gk, and 0185-003.gif. We say M converges on G1,G2, . . . ,Gk to i (written: 0185-004.gif) just in case there exists an n such that M(G1[n1],G2[n2], . . . , Gk[nk]) = i whenever n1, . . . , nk > n.
Next, we define the identification of functions from multiple texts, some of which could be inaccurate.
8.42 Definition Let j 0185-005.gif with 0185-006.gif. Let a, 0185-007.gif.
(a) M 0185-008.gif f (written: 0185-009.gif) just in case for any collection of k texts, G1, G2, . . . , Gk, at least j of which are a-noisy for f, 0185-010.gif and  j M(G1,G2, . . . ,Gk) =b f.
(b) 0185-011.gif.
(c) 0185-012.gif and 0185-013.gif are defined similarly to 0185-014.gif.
8.43 Definition When discussing identification criteria involving multiple data sources, those texts for which the number of inaccuracies is within the bound required by the criteria will be referred to as good texts.
The next three propositions describe situations in which learning from multiple inaccurate texts is no more difficult than learning from a single inaccurate text.
8.44 Proposition Let j and 0185-015.gif be such that 0185-016.gif and let 0185-017.gif. Then:
(a) 0185-018.gif.
(b) 0185-019.gif.
(c) 0185-020.gif.

 
[Cover] [Contents] [Index] Previous page Next Section