Optimal stopping and effective machine complexity in learning

Abstract

We study learning in a general class of machines which return a (variable) linear form of a (fixed) set of nonlinear transformations of points in an input space. A fixed machine in this class accepts inputs X from an arbitrary input space and produces scalar outputs Y=/spl Sigma//sub i=1//sup d//spl psi//sub i/(X)w/sub i/*+/spl xi/=/spl psi/(X)'w*+/spl xi/ (1). Here, w*=(w/sub 1/*,...,w/sub d/*)' is a fixed vector of real weights representing the target concept to be learned, for each i, /spl psi//sub i/(X) is a fixed real function of the inputs, with /spl psi/(X)=(/spl psi//sub 1/(X),.../spl psi//sub d/(X))' the corresponding vector of functions, and /spl xi/ is a random noise term. We suppose that the learner receives an i.i.d., random sample of examples (X/sub 1/,Y/sub 1/),...,(X/sub n/,Y/sub n/) generated according to the joint distribution on input-output pairs (X,Y) induced through the medium of the (unknown) relation (1) and a fixed (unknown) distribution on input-noise pairs (X,/spl xi/). The goal of the learner is to infer a hypothesis w=(w/sub 1/,...w/sub d/)' with small (mean-square) generalisation error E(Y-/spl psi/(X)'w)/sup 2/ on future random examples (X,Y) generated independently of the training sample from the same underlying distribution. Here E denotes expectation with respect to the underlying probability distribution generating the examples.

References

Page 1

	Year	Citations

Page 1