Online Bilevel Optimization: Regret Analysis of Online Alternating Gradient Methods

Davoud Ataee Tarzanagh; Parvin Nazari; Bojian Hou; Li Shen; Laura Balzano

Online Bilevel Optimization: Regret Analysis of Online Alternating Gradient Methods

Davoud Ataee Tarzanagh, Parvin Nazari, Bojian Hou, Li Shen, Laura Balzano

TL;DR

This work introduces online bilevel optimization (OBO) and formalizes bilevel regret notions, including dynamic and static variants, with outer/inner path-length regularities to capture nonstationarity. It proposes Online Alternating Gradient Descent (OAGD) that uses a time-averaged hypergradient to update the outer variable while performing inner updates, achieving regret bounds that scale with $S_{p,T}=P_{p,T}+Y_{p,T}$. The authors establish strong theoretical results across strongly convex, convex, and non-convex settings, including lower bounds and a bilevel local regret bound for non-convex outer losses. They validate OBO experimentally on online hyperparameter learning for dynamic regression, online parametric loss tuning for imbalanced data, and online meta-learning, demonstrating competitive performance and favorable runtimes relative to baselines.

Abstract

This paper introduces \textit{online bilevel optimization} in which a sequence of time-varying bilevel problems is revealed one after the other. We extend the known regret bounds for online single-level algorithms to the bilevel setting. Specifically, we provide new notions of \textit{bilevel regret}, develop an online alternating time-averaged gradient method that is capable of leveraging smoothness, and give regret bounds in terms of the path-length of the inner and outer minimizer sequences.

Online Bilevel Optimization: Regret Analysis of Online Alternating Gradient Methods

TL;DR

. The authors establish strong theoretical results across strongly convex, convex, and non-convex settings, including lower bounds and a bilevel local regret bound for non-convex outer losses. They validate OBO experimentally on online hyperparameter learning for dynamic regression, online parametric loss tuning for imbalanced data, and online meta-learning, demonstrating competitive performance and favorable runtimes relative to baselines.

Abstract

Paper Structure (58 sections, 21 theorems, 195 equations, 14 figures, 2 tables, 1 algorithm)

This paper contains 58 sections, 21 theorems, 195 equations, 14 figures, 2 tables, 1 algorithm.

Introduction
Background: Online Single-Level Optimization
Stackelberg Game and Online Bilevel Optimization
Related Work
Algorithm and Regret Bounds
OBO with (Hyper-)Gradient Information
Main Results
Local Regret Minimization
Experimental Results
Online Hyperparameters Learning for Dynamic Regression
Online Parametric Loss Tuning for Imbalanced Data
Online Meta-Learning
Conclusion
Addendum to Section \ref{['sec:intro']}: Preliminaries and Notations
On the Comparability of Dynamic Metrics
...and 43 more sections

Key Result

Lemma 3

Under Assumption assu:f, for all $t \in [T]$, ${\bf{x}}, {\bf{x}}' \in \mathcal{X}$, and ${\bf{y}} \in {\mathbb{R}}^{d_2}$, we have Here, $L_{{\bf{y}}}=\mathcal{O}(\kappa_g)$, $M_f=\mathcal{O}(\kappa_g^2)$, and $L_f= \mathcal{O}(\kappa_g^3)$.

Figures (14)

Figure 1: Performance of OAGD in online hyperparameter learning over five runs. The left and middle figures show OBO's regret with three comparators and a fixed comparator, respectively. The right figure illustrates the outer problem's trajectories and the performance of OAGD and offline HO in learning the hyperparameter $x_1$.
Figure 2: Performance comparison (mean$\pm$std) on loss tuning for imbalanced MNIST data across five runs.
Figure 3: Performance comparison (mean$\pm$std) on loss tuning for imbalanced MNIST data across five runs, considering induced distribution shift.
Figure 4: Performance comparison (mean$\pm$std) on meta-learning for FC100 data across five runs.
Figure 5: Performance comparison (mean$\pm$std) on parametric loss tuning for imbalanced Tadpole data over five runs. We compare our OAGD ($w=5, 10$) with AutoBalance and Single-Level OGD. OAGD achieves comparable balanced testing accuracy to AutoBalance but with a reduced runtime.
...and 9 more figures

Theorems & Definitions (38)

Definition 1: Time-Averaged Hypergradient
Remark 2
Lemma 3
Theorem 4: Strongly-Convex Dynamic
Theorem 5: Lower Bound
Theorem 6: Strongly-Convex Static
Theorem 7: Convex Dynamic
Theorem 8: Convex Static
Theorem 9: Non-convex Local
Lemma 10
...and 28 more

Online Bilevel Optimization: Regret Analysis of Online Alternating Gradient Methods

TL;DR

Abstract

Online Bilevel Optimization: Regret Analysis of Online Alternating Gradient Methods

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (38)