Riemannian Stochastic Gradient Method for Nested Composition Optimization

Dewei Zhang; Sam Davanloo Tajbakhsh

Riemannian Stochastic Gradient Method for Nested Composition Optimization

Dewei Zhang, Sam Davanloo Tajbakhsh

TL;DR

This work presents a Riemannian Stochastic Composition Gradient Descent method that finds an approximate stationary point, with expected squared Riemannian gradient smaller than $\epsilon$, in $\mathcal{O}\left(\epsilon^{-2}\right)$ calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function.

Abstract

This work considers optimization of composition of functions in a nested form over Riemannian manifolds where each function contains an expectation. This type of problems is gaining popularity in applications such as policy evaluation in reinforcement learning or model customization in meta-learning. The standard Riemannian stochastic gradient methods for non-compositional optimization cannot be directly applied as stochastic approximation of inner functions create bias in the gradients of the outer functions. For two-level composition optimization, we present a Riemannian Stochastic Composition Gradient Descent (R-SCGD) method that finds an approximate stationary point, with expected squared Riemannian gradient smaller than $ε$, in $O(ε^{-2})$ calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function. Furthermore, we generalize the R-SCGD algorithms for problems with multi-level nested compositional structures, with the same complexity of $O(ε^{-2})$ for the first-order stochastic oracle. Finally, the performance of the R-SCGD method is numerically evaluated over a policy evaluation problem in reinforcement learning.

Riemannian Stochastic Gradient Method for Nested Composition Optimization

TL;DR

This work presents a Riemannian Stochastic Composition Gradient Descent method that finds an approximate stationary point, with expected squared Riemannian gradient smaller than

, in

calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function.

Abstract

, in

calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function. Furthermore, we generalize the R-SCGD algorithms for problems with multi-level nested compositional structures, with the same complexity of

for the first-order stochastic oracle. Finally, the performance of the R-SCGD method is numerically evaluated over a policy evaluation problem in reinforcement learning.

Paper Structure (21 sections, 5 theorems, 72 equations, 2 figures, 2 algorithms)

This paper contains 21 sections, 5 theorems, 72 equations, 2 figures, 2 algorithms.

Introduction
Motivating example.
Related work
Manifold optimization.
Stochastic compositional optimization.
Contributions
Preliminaries
Two-level Riemannian composition
Algorithm development motivated by ODE analysis
Iteration complexity of the two-level R-SCGD method
Multi-level Riemannian composition
Iteration complexity of the multi-level R-SCGD method
Numerical studies
Conclusion
Complexity analysis of the two-level R-SCGD
...and 6 more sections

Key Result

Lemma 2.1

If the random sample oracle for derivatives (gradients) satisfies and random variable $\phi$ is independent of $\xi$, then

Figures (2)

Figure 1: Performance of the proposed R-SCGD algorithm compared to the Riemannian SGD, for the norm of the Riemannian gradient. (Left) The first setting with 81 states. (Right) The second setting with 121 states.
Figure 2: Performance of the proposed R-SCGD algorithm compared to the Riemannian SGD. Left and right plots illustrate the 81- and 121-state settings, respectively. (Top) Top plots show the inner function approximation bias (Bottom) Bottom plots show the function value gap.

Theorems & Definitions (15)

Definition 1
Definition 2: Riemannian gradient
Definition 3: Adjoint of an operator
Remark 2.1
Lemma 2.1
Lemma 2.2
Theorem 2.1: Two-level R-SCGD
Remark 3.1
Lemma 3.1
Theorem 3.1: Multi-level R-SCGD
...and 5 more

Riemannian Stochastic Gradient Method for Nested Composition Optimization

TL;DR

Abstract

Riemannian Stochastic Gradient Method for Nested Composition Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (15)