Publication | Closed Access
Efficient mini-batch training for stochastic optimization
763
Citations
27
References
2014
Year
Unknown Venue
Model OptimizationEfficient Mini-batch TrainingEngineeringMachine LearningData ScienceStochastic OptimizationStochastic Gradient DescentParallel LearningLarge Scale OptimizationConvergence RateParallel ProgrammingComputer ScienceApproximate OptimizationParallel ComputingDeep LearningAdaptive Optimization
Stochastic gradient descent is widely used for large‑scale machine‑learning optimization, but larger minibatches usually slow convergence. The study proposes a technique that approximately optimizes a conservatively regularized objective within each minibatch. The method parallelizes SGD by employing minibatch training to cut communication costs and uses approximate optimization of the regularized objective per batch. The authors prove that the convergence rate remains unchanged with larger minibatches and show experimentally that the algorithm can outperform standard SGD in many settings.
Stochastic gradient descent (SGD) is a popular technique for large-scale optimization problems in machine learning. In order to parallelize SGD, minibatch training needs to be employed to reduce the communication cost. However, an increase in minibatch size typically decreases the rate of convergence. This paper introduces a technique based on approximate optimization of a conservatively regularized objective function within each minibatch. We prove that the convergence rate does not decrease with increasing minibatch size. Experiments demonstrate that with suitable implementations of approximate optimization, the resulting algorithm can outperform standard SGD in many scenarios.
| Year | Citations | |
|---|---|---|
Page 1
Page 1