Table of Contents
Fetching ...

Riemannian Bilevel Optimization

Sanchayan Dutta, Xiang Cheng, Suvrit Sra

TL;DR

This work addresses Riemannian bilevel optimization by studying outer objectives $F(x)=f(x,y^*(x))$ with inner problems $y^*(x)=\arg\min_{y} g(x,y)$ on manifolds, aiming to avoid second-order hypergradients.It introduces RF$^2$SA, a fully first-order, single-loop method based on a constrained reformulation using $\mathcal{L}_{\lambda}(x,y)=f(x,y)+\lambda(g(x,y)-g^*(x))$ and a growing multiplier $\lambda_k$, enabling gradient-based updates on manifolds without access to second-order derivatives.The paper establishes non-asymptotic convergence to $\epsilon$-stationarity with precise rates: $\tilde{O}(\epsilon^{-2/7})$ or worse depending on gradient noise, notably $\tilde{O}(\epsilon^{-2/3})$ in the noiseless case, and extends the results to Alexandrov spaces to account for curvature.This geometry-aware, fully first-order approach broadens applicability to non-strongly convex inner problems and manifold-constrained settings, offering a practical alternative to hypergradient-based methods in Riemannian bilevel optimization.

Abstract

We develop new algorithms for Riemannian bilevel optimization. We focus in particular on batch and stochastic gradient-based methods, with the explicit goal of avoiding second-order information such as Riemannian hyper-gradients. We propose and analyze $\mathrm{RF^2SA}$, a method that leverages first-order gradient information to navigate the complex geometry of Riemannian manifolds efficiently. Notably, $\mathrm{RF^2SA}$ is a single-loop algorithm, and thus easier to implement and use. Under various setups, including stochastic optimization, we provide explicit convergence rates for reaching $ε$-stationary points. We also address the challenge of optimizing over Riemannian manifolds with constraints by adjusting the multiplier in the Lagrangian, ensuring convergence to the desired solution without requiring access to second-order derivatives.

Riemannian Bilevel Optimization

TL;DR

This work addresses Riemannian bilevel optimization by studying outer objectives $F(x)=f(x,y^*(x))$ with inner problems $y^*(x)=\arg\min_{y} g(x,y)$ on manifolds, aiming to avoid second-order hypergradients.It introduces RF$^2$SA, a fully first-order, single-loop method based on a constrained reformulation using $\mathcal{L}_{\lambda}(x,y)=f(x,y)+\lambda(g(x,y)-g^*(x))$ and a growing multiplier $\lambda_k$, enabling gradient-based updates on manifolds without access to second-order derivatives.The paper establishes non-asymptotic convergence to $\epsilon$-stationarity with precise rates: $\tilde{O}(\epsilon^{-2/7})$ or worse depending on gradient noise, notably $\tilde{O}(\epsilon^{-2/3})$ in the noiseless case, and extends the results to Alexandrov spaces to account for curvature.This geometry-aware, fully first-order approach broadens applicability to non-strongly convex inner problems and manifold-constrained settings, offering a practical alternative to hypergradient-based methods in Riemannian bilevel optimization.

Abstract

We develop new algorithms for Riemannian bilevel optimization. We focus in particular on batch and stochastic gradient-based methods, with the explicit goal of avoiding second-order information such as Riemannian hyper-gradients. We propose and analyze , a method that leverages first-order gradient information to navigate the complex geometry of Riemannian manifolds efficiently. Notably, is a single-loop algorithm, and thus easier to implement and use. Under various setups, including stochastic optimization, we provide explicit convergence rates for reaching -stationary points. We also address the challenge of optimizing over Riemannian manifolds with constraints by adjusting the multiplier in the Lagrangian, ensuring convergence to the desired solution without requiring access to second-order derivatives.
Paper Structure (26 sections, 10 theorems, 140 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 26 sections, 10 theorems, 140 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Theorem 1

There exist choices of hyperparameters of Algorithm RF$^{2}$SA such that the following stationarity guarantees hold:

Figures (1)

  • Figure 1: $y_{k}$ should move faster than $y_{\lambda_{k}}^{*}(x_{k})$, remaining within an $O(1/\lambda_{k})$-ball around $y_{\lambda_{k}}^{*}(x_{k})$.

Theorems & Definitions (22)

  • Theorem 1: Informal
  • Definition 1: $\epsilon$-stationary point
  • Lemma 1
  • Lemma 2
  • Theorem 2: Alexandrov Space Version
  • Corollary 3
  • proof
  • proof
  • proof
  • proof
  • ...and 12 more