Table of Contents
Fetching ...

Fully First-Order Algorithms for Online Bilevel Optimization

Tingkai Jia, Cheng Chen

TL;DR

This work tackles non-convex-upper-level online bilevel optimization with drift by eliminating second-order dependence. It reformulates the bilevel problem as a single-level online problem with an inequality constraint and develops a fully first-order online bilevel optimizer (F^2OBO) that uses a time-varying multiplier to approximate the original problem without Hessian–vector products. An adaptive inner-iteration variant (AF^2OBO) further reduces dependence on inner-drift (H_{2,T}) by tuning inner accuracy, achieving regret bounds that are robust to aggressive distribution shift. The results show sublinear bilevel local regret under favorable parameter choices (τ>1), with O(T log T) per-iteration complexity, and a static-environment regime yielding near-optimal first-order convergence; AF^2OBO offers a favorable trade-off when inner drift is substantial.

Abstract

In this work, we study non-convex-strongly-convex online bilevel optimization (OBO). Existing OBO algorithms are mainly based on hypergradient descent, which requires access to a Hessian-vector product (HVP) oracle and potentially incurs high computational costs. By reformulating the original OBO problem as a single-level online problem with inequality constraints and constructing a sequence of Lagrangian function, we eliminate the need for HVPs arising from implicit differentiation. Specifically, we propose a fully first-order algorithm for OBO, and provide theoretical guarantees showing that it achieves regret of $O(1 + V_T + H_{2,T})$. Furthermore, we develop an improved variant with an adaptive inner-iteration scheme, which removes the dependence on the drift variation of the inner-level optimal solution and achieves regret of $O(\sqrt{T} + V_T)$. This regret have the advatange when $V_{T}\ge O(\sqrt{T})$.

Fully First-Order Algorithms for Online Bilevel Optimization

TL;DR

This work tackles non-convex-upper-level online bilevel optimization with drift by eliminating second-order dependence. It reformulates the bilevel problem as a single-level online problem with an inequality constraint and develops a fully first-order online bilevel optimizer (F^2OBO) that uses a time-varying multiplier to approximate the original problem without Hessian–vector products. An adaptive inner-iteration variant (AF^2OBO) further reduces dependence on inner-drift (H_{2,T}) by tuning inner accuracy, achieving regret bounds that are robust to aggressive distribution shift. The results show sublinear bilevel local regret under favorable parameter choices (τ>1), with O(T log T) per-iteration complexity, and a static-environment regime yielding near-optimal first-order convergence; AF^2OBO offers a favorable trade-off when inner drift is substantial.

Abstract

In this work, we study non-convex-strongly-convex online bilevel optimization (OBO). Existing OBO algorithms are mainly based on hypergradient descent, which requires access to a Hessian-vector product (HVP) oracle and potentially incurs high computational costs. By reformulating the original OBO problem as a single-level online problem with inequality constraints and constructing a sequence of Lagrangian function, we eliminate the need for HVPs arising from implicit differentiation. Specifically, we propose a fully first-order algorithm for OBO, and provide theoretical guarantees showing that it achieves regret of . Furthermore, we develop an improved variant with an adaptive inner-iteration scheme, which removes the dependence on the drift variation of the inner-level optimal solution and achieves regret of . This regret have the advatange when .
Paper Structure (18 sections, 15 theorems, 81 equations, 1 table, 2 algorithms)

This paper contains 18 sections, 15 theorems, 81 equations, 1 table, 2 algorithms.

Key Result

Lemma 3.1

For any $\mathbf{x}\in\mathcal{X}$, and $\lambda_t\geq\frac{2L_{f,1}}{\mu_g}$ for all $t\in[T]$, we have where $D_1$ is some constant.

Theorems & Definitions (22)

  • Lemma 3.1: Based on Lemma 3.1 in kwon2023fullyfirstordermethodstochastic
  • Lemma 3.2
  • Lemma 3.3
  • Theorem 3.4
  • Lemma 4.1
  • Theorem 4.2
  • Lemma A.1: Lemma 16 in tarzanagh2024onlinebileveloptimizationregret
  • Lemma A.2: Lemma 4.1 in chen2023near
  • Lemma A.3: Based on Lemma 3.2 in kwon2023fullyfirstordermethodstochastic
  • proof
  • ...and 12 more