Publication | Closed Access
An accelerated gradient method for trace norm minimization
536
Citations
27
References
2009
Year
Unknown Venue
Numerical AnalysisMathematical ProgrammingEngineeringMachine LearningSmooth Loss FunctionTrace NormTrace Norm MinimizationData ScienceDerivative-free OptimizationMulti-task LearningRegularization (Mathematics)Approximation TheoryLow-rank ApproximationLarge Scale OptimizationInverse ProblemsComputer ScienceDeep LearningNondifferentiable OptimizationMatrix FactorizationOptimal Convergence Rate
Trace‑norm regularized smooth loss minimization is widely used in machine learning tasks such as multi‑task learning, matrix classification, and matrix completion, but standard semidefinite programming approaches are computationally expensive and first‑order black‑box methods converge only at O(1/√k). The authors aim to design algorithms that achieve faster convergence, specifically O(1/k) for an extended gradient method and the optimal O(1/k²) for an accelerated gradient method on smooth problems. They exploit the special structure of the trace norm to construct an extended gradient algorithm with O(1/k) convergence and an accelerated gradient algorithm that attains the optimal O(1/k²) rate for smooth objectives. Experiments on multi‑task learning problems confirm that the proposed algorithms are more efficient than existing methods.
We consider the minimization of a smooth loss function regularized by the trace norm of the matrix variable. Such formulation finds applications in many machine learning tasks including multi-task learning, matrix classification, and matrix completion. The standard semidefinite programming formulation for this problem is computationally expensive. In addition, due to the non-smooth nature of the trace norm, the optimal first-order black-box method for solving such class of problems converges as O(1/√k), where k is the iteration counter. In this paper, we exploit the special structure of the trace norm, based on which we propose an extended gradient algorithm that converges as O(1/k). We further propose an accelerated gradient algorithm, which achieves the optimal convergence rate of O(1/k2) for smooth problems. Experiments on multi-task learning problems demonstrate the efficiency of the proposed algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1