A Variational Formulation of Accelerated Optimization on Riemannian Manifolds

Abstract

It was shown recently by [W. Su, S. Boyd, and E. Candes, J. Mach. Learn. Res., 17 (2016), pp. 1--43] that Nesterov's accelerated gradient method for minimizing a smooth convex function $f$ can be thought of as the time discretization of a second-order ODE and that $f(x(t))$ converges to its optimal value at a rate of $\mathcal{O}(1/t^2)$ along any trajectory $x(t)$ of this ODE. A variational formulation was introduced in [A. Wibisono, A. Wilson, and M. Jordan, Proc Natl. Acad. Sci. USA, 113 (2016), pp. E7351--E7358] which allowed for accelerated convergence at a rate of $\mathcal{O}(1/t^p)$, for arbitrary $p>0$, in normed vector spaces. This framework was exploited in [V. Duruisseaux, J. Schmitt, and M. Leok, SIAM J. Sci. Comput., 43 (2021), pp. A2949--A2980] using time-adaptive geometric integrators to design efficient explicit algorithms for symplectic accelerated optimization. In [F. Alimisis, A. Orvieto, G. Bécigneul, and A. Lucchi, Proceedings of the 23rd International AISTATS Conference, 2020, pp. 1297--1307], a second-order ODE was proposed as the continuous-time limit of a Riemannian accelerated algorithm, and it was shown that the objective function $f(x(t))$ converges to its optimal value at a rate of $\mathcal{O}(1/t^2)$ along solutions of this ODE, thereby generalizing the earlier Euclidean result to the Riemannian manifold setting. In this paper, we show that on Riemannian manifolds, the convergence rate of $f(x(t))$ to its optimal value can also be accelerated to an arbitrary convergence rate $\mathcal{O}(1/t^p)$, by considering a family of time-dependent Bregman Lagrangian and Hamiltonian systems on Riemannian manifolds. This generalizes the results of Wibisono, Wilson, and Jordan to Riemannian manifolds and also provides a variational framework for accelerated optimization on Riemannian manifolds. In particular, we will establish results for objective functions on Riemannian manifolds that are geodesically convex, weakly quasi-convex, and strongly convex. An approach based on the time-invariance property of the family of Bregman Lagrangians and Hamiltonians was used to construct very efficient optimization algorithms by Duruisseaux, Schmitt, and Leok, and we establish a similar time-invariance property in the Riemannian setting. This lays the foundation for constructing similarly efficient optimization algorithms on Riemannian manifolds, once the Riemannian analogues of time-adaptive Hamiltonian variational integrators have been developed. The experience with the numerical discretization of variational accelerated optimization flows on vector spaces suggests that the combination of time-adaptivity and symplecticity is important for the efficient, robust, and stable discretization of these variational flows describing accelerated optimization. One expects that a geometric numerical integrator that is time-adaptive, symplectic, and Riemannian manifold preserving will yield a class of similarly promising optimization algorithms on manifolds.

References

Page 1

	Year	Citations

Page 1