Table of Contents
Fetching ...

Riemannian Stochastic Gradient Method for Nested Composition Optimization

Dewei Zhang, Sam Davanloo Tajbakhsh

TL;DR

This work presents a Riemannian Stochastic Composition Gradient Descent method that finds an approximate stationary point, with expected squared Riemannian gradient smaller than $\epsilon$, in $\mathcal{O}\left(\epsilon^{-2}\right)$ calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function.

Abstract

This work considers optimization of composition of functions in a nested form over Riemannian manifolds where each function contains an expectation. This type of problems is gaining popularity in applications such as policy evaluation in reinforcement learning or model customization in meta-learning. The standard Riemannian stochastic gradient methods for non-compositional optimization cannot be directly applied as stochastic approximation of inner functions create bias in the gradients of the outer functions. For two-level composition optimization, we present a Riemannian Stochastic Composition Gradient Descent (R-SCGD) method that finds an approximate stationary point, with expected squared Riemannian gradient smaller than $ε$, in $O(ε^{-2})$ calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function. Furthermore, we generalize the R-SCGD algorithms for problems with multi-level nested compositional structures, with the same complexity of $O(ε^{-2})$ for the first-order stochastic oracle. Finally, the performance of the R-SCGD method is numerically evaluated over a policy evaluation problem in reinforcement learning.

Riemannian Stochastic Gradient Method for Nested Composition Optimization

TL;DR

This work presents a Riemannian Stochastic Composition Gradient Descent method that finds an approximate stationary point, with expected squared Riemannian gradient smaller than , in calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function.

Abstract

This work considers optimization of composition of functions in a nested form over Riemannian manifolds where each function contains an expectation. This type of problems is gaining popularity in applications such as policy evaluation in reinforcement learning or model customization in meta-learning. The standard Riemannian stochastic gradient methods for non-compositional optimization cannot be directly applied as stochastic approximation of inner functions create bias in the gradients of the outer functions. For two-level composition optimization, we present a Riemannian Stochastic Composition Gradient Descent (R-SCGD) method that finds an approximate stationary point, with expected squared Riemannian gradient smaller than , in calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function. Furthermore, we generalize the R-SCGD algorithms for problems with multi-level nested compositional structures, with the same complexity of for the first-order stochastic oracle. Finally, the performance of the R-SCGD method is numerically evaluated over a policy evaluation problem in reinforcement learning.
Paper Structure (21 sections, 5 theorems, 72 equations, 2 figures, 2 algorithms)

This paper contains 21 sections, 5 theorems, 72 equations, 2 figures, 2 algorithms.

Key Result

Lemma 2.1

If the random sample oracle for derivatives (gradients) satisfies and random variable $\phi$ is independent of $\xi$, then

Figures (2)

  • Figure 1: Performance of the proposed R-SCGD algorithm compared to the Riemannian SGD, for the norm of the Riemannian gradient. (Left) The first setting with 81 states. (Right) The second setting with 121 states.
  • Figure 2: Performance of the proposed R-SCGD algorithm compared to the Riemannian SGD. Left and right plots illustrate the 81- and 121-state settings, respectively. (Top) Top plots show the inner function approximation bias (Bottom) Bottom plots show the function value gap.

Theorems & Definitions (15)

  • Definition 1
  • Definition 2: Riemannian gradient
  • Definition 3: Adjoint of an operator
  • Remark 2.1
  • Lemma 2.1
  • Lemma 2.2
  • Theorem 2.1: Two-level R-SCGD
  • Remark 3.1
  • Lemma 3.1
  • Theorem 3.1: Multi-level R-SCGD
  • ...and 5 more