A Framework for Bilevel Optimization on Riemannian Manifolds

Andi Han; Bamdev Mishra; Pratik Jawanpuria; Akiko Takeda

A Framework for Bilevel Optimization on Riemannian Manifolds

Andi Han, Bamdev Mishra, Pratik Jawanpuria, Akiko Takeda

TL;DR

This study introduces a framework for solving bilevel optimization problems, where the variables in both the lower and upper levels are constrained on Riemannian manifolds, and extends it to encompass stochastic bilevel optimization and incorporate the use of general retraction.

Abstract

Bilevel optimization has gained prominence in various applications. In this study, we introduce a framework for solving bilevel optimization problems, where the variables in both the lower and upper levels are constrained on Riemannian manifolds. We present several hypergradient estimation strategies on manifolds and analyze their estimation errors. Furthermore, we provide comprehensive convergence and complexity analyses for the proposed hypergradient descent algorithm on manifolds. We also extend our framework to encompass stochastic bilevel optimization and incorporate the use of general retraction. The efficacy of the proposed framework is demonstrated through several applications.

A Framework for Bilevel Optimization on Riemannian Manifolds

TL;DR

Abstract

Paper Structure (40 sections, 19 theorems, 95 equations, 3 figures, 4 tables, 4 algorithms)

This paper contains 40 sections, 19 theorems, 95 equations, 3 figures, 4 tables, 4 algorithms.

Introduction
Preliminaries and notations
Proposed Riemannian hypergradient algorithm
Hypergradient estimation
Theoretical analysis
Extension to stochastic bilevel optimization
Extension to retraction
Experiments
Synthetic problem
Hyper-representation over SPD manifolds
Riemannian meta learning
Unsupervised domain adaptation
Conclusion
Riemannian geometries of considered manifolds
Important Lemmas
...and 25 more sections

Key Result

Proposition 1

The differential of $y^*(x)$ and the Riemannian hypergradient of $F(x)$ are given by

Figures (3)

Figure 1: Figures (a) & (b) show the plot of objective of the upper-level problem (Upper Objective) for different strategies. HINV and CG strategies have fastest convergence, followed by NS and AD. The corresponding estimation errors are shown in (c). Figure (d) specifically shows the robustness of approximation error obtained by NS across different $\gamma$ and $T$ values.
Figure 2: Figures (a), (b), and (c) show the performance of RHGD on the hyper-representation problems on SPD networks. Figure (d) shows the good generalization performance of our proposed RHGD algorithms over the projected gradient PHGD baselines on the MiniImageNet dataset.
Figure : Riemannian stochastic bilevel optimization with Hessian inverse.

Theorems & Definitions (35)

Proposition 1
Definition 1: Lipschitzness
Definition 2: $\epsilon$-stationary point
Lemma 1: Hypergradient approximation error bound
Theorem 1
Theorem 2
Theorem 3
Proposition 2: boumal2023introduction
Lemma 2: sun2019escapinghan2023riemna
Lemma 3: Trigonometric distance bound zhang2016firstzhang2016riemannianhan2021riemannian
...and 25 more

A Framework for Bilevel Optimization on Riemannian Manifolds

TL;DR

Abstract

A Framework for Bilevel Optimization on Riemannian Manifolds

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (35)