Optimal Stopping and Effective Machine Complexity in Learning

Abstract

We study the problem of when to stop learning a class of feedforward networks -- networks with linear outputs neuron and fixed input weights -- when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there are in general three distinct phases in the generalization performance in the learning process, and in particular, the network has better generalization performance when learning is stopped at a certain time before the global minimum of the empirical error is reached. A notion of effective size of a machine is defined and used to explain the trade-off between the complexity of the machine and the training error in the learning process. The study leads naturally to a network size selection criterion, which turns out to be a generalization of Akaike&apos;s Information Criterion for the learning process. It is shown that stopping learning before the global minimum of the empirical error has the effect of ne...

References

Page 1

	Year	Citations

Page 1