Publication | Closed Access
A Simple Weight Decay Can Improve Generalization
1.3K
Citations
10
References
1991
Year
Unknown Venue
Artificial IntelligenceEngineeringMachine LearningWeight DecayNetwork AnalysisRecurrent Neural NetworkPattern RecognitionSparse Neural NetworkNumerical SimulationsNeural Scaling LawComputational Learning TheoryWeight VectorMachine Learning ModelComputer EngineeringComputer ScienceDeep LearningNeural Architecture SearchFeature ScalingEvolving Neural NetworkStatistical Inference
It has been observed in numerical simulations that a weight decay can improve generalization in a feed-forward neural network. This paper explains why. It is proven that a weight decay has two effects in a linear network. First, it suppresses any irrelevant components of the weight vector by choosing the smallest vector that solves the learning problem. Second, if the size is chosen right, a weight decay can suppress some of the effects of static noise on the targets, which improves generalization quite a lot. It is then shown how to extend these results to networks with hidden layers and non-linear units. Finally the theory is confirmed by some numerical simulations using the data from NetTalk.
| Year | Citations | |
|---|---|---|
Page 1
Page 1