Publication | Open Access
Gradient-based Hyperparameter Optimization through Reversible Learning
403
Citations
24
References
2015
Year
Artificial IntelligenceDerivatives BackwardsModel OptimizationHyperparameter EstimationEngineeringMachine LearningData ScienceModel TuningParameter TuningComputer ScienceDeep LearningExact GradientsGradient-based Hyperparameter OptimizationHyperparameter GradientsAdaptive Optimization
Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure. These gradients allow us to optimize thousands of hyperparameters, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures. We compute hyperparameter gradients by exactly reversing the dynamics of stochastic gradient descent with momentum.
| Year | Citations | |
|---|---|---|
Page 1
Page 1