Table of Contents
Fetching ...

Online Bilevel Optimization: Regret Analysis of Online Alternating Gradient Methods

Davoud Ataee Tarzanagh, Parvin Nazari, Bojian Hou, Li Shen, Laura Balzano

TL;DR

This work introduces online bilevel optimization (OBO) and formalizes bilevel regret notions, including dynamic and static variants, with outer/inner path-length regularities to capture nonstationarity. It proposes Online Alternating Gradient Descent (OAGD) that uses a time-averaged hypergradient to update the outer variable while performing inner updates, achieving regret bounds that scale with $S_{p,T}=P_{p,T}+Y_{p,T}$. The authors establish strong theoretical results across strongly convex, convex, and non-convex settings, including lower bounds and a bilevel local regret bound for non-convex outer losses. They validate OBO experimentally on online hyperparameter learning for dynamic regression, online parametric loss tuning for imbalanced data, and online meta-learning, demonstrating competitive performance and favorable runtimes relative to baselines.

Abstract

This paper introduces \textit{online bilevel optimization} in which a sequence of time-varying bilevel problems is revealed one after the other. We extend the known regret bounds for online single-level algorithms to the bilevel setting. Specifically, we provide new notions of \textit{bilevel regret}, develop an online alternating time-averaged gradient method that is capable of leveraging smoothness, and give regret bounds in terms of the path-length of the inner and outer minimizer sequences.

Online Bilevel Optimization: Regret Analysis of Online Alternating Gradient Methods

TL;DR

This work introduces online bilevel optimization (OBO) and formalizes bilevel regret notions, including dynamic and static variants, with outer/inner path-length regularities to capture nonstationarity. It proposes Online Alternating Gradient Descent (OAGD) that uses a time-averaged hypergradient to update the outer variable while performing inner updates, achieving regret bounds that scale with . The authors establish strong theoretical results across strongly convex, convex, and non-convex settings, including lower bounds and a bilevel local regret bound for non-convex outer losses. They validate OBO experimentally on online hyperparameter learning for dynamic regression, online parametric loss tuning for imbalanced data, and online meta-learning, demonstrating competitive performance and favorable runtimes relative to baselines.

Abstract

This paper introduces \textit{online bilevel optimization} in which a sequence of time-varying bilevel problems is revealed one after the other. We extend the known regret bounds for online single-level algorithms to the bilevel setting. Specifically, we provide new notions of \textit{bilevel regret}, develop an online alternating time-averaged gradient method that is capable of leveraging smoothness, and give regret bounds in terms of the path-length of the inner and outer minimizer sequences.
Paper Structure (58 sections, 21 theorems, 195 equations, 14 figures, 2 tables, 1 algorithm)

This paper contains 58 sections, 21 theorems, 195 equations, 14 figures, 2 tables, 1 algorithm.

Key Result

Lemma 3

Under Assumption assu:f, for all $t \in [T]$, ${\bf{x}}, {\bf{x}}' \in \mathcal{X}$, and ${\bf{y}} \in {\mathbb{R}}^{d_2}$, we have Here, $L_{{\bf{y}}}=\mathcal{O}(\kappa_g)$, $M_f=\mathcal{O}(\kappa_g^2)$, and $L_f= \mathcal{O}(\kappa_g^3)$.

Figures (14)

  • Figure 1: Performance of OAGD in online hyperparameter learning over five runs. The left and middle figures show OBO's regret with three comparators and a fixed comparator, respectively. The right figure illustrates the outer problem's trajectories and the performance of OAGD and offline HO in learning the hyperparameter $x_1$.
  • Figure 2: Performance comparison (mean$\pm$std) on loss tuning for imbalanced MNIST data across five runs.
  • Figure 3: Performance comparison (mean$\pm$std) on loss tuning for imbalanced MNIST data across five runs, considering induced distribution shift.
  • Figure 4: Performance comparison (mean$\pm$std) on meta-learning for FC100 data across five runs.
  • Figure 5: Performance comparison (mean$\pm$std) on parametric loss tuning for imbalanced Tadpole data over five runs. We compare our OAGD ($w=5, 10$) with AutoBalance and Single-Level OGD. OAGD achieves comparable balanced testing accuracy to AutoBalance but with a reduced runtime.
  • ...and 9 more figures

Theorems & Definitions (38)

  • Definition 1: Time-Averaged Hypergradient
  • Remark 2
  • Lemma 3
  • Theorem 4: Strongly-Convex Dynamic
  • Theorem 5: Lower Bound
  • Theorem 6: Strongly-Convex Static
  • Theorem 7: Convex Dynamic
  • Theorem 8: Convex Static
  • Theorem 9: Non-convex Local
  • Lemma 10
  • ...and 28 more