Alternating Stochastic Variance-Reduced Algorithms with Optimal Complexity for Bilevel Optimization

Haimei Huo; Zhixun Su

Alternating Stochastic Variance-Reduced Algorithms with Optimal Complexity for Bilevel Optimization

Haimei Huo, Zhixun Su

TL;DR

The paper tackles nonconvex-strongly-convex bilevel optimization by deriving an explicit hypergradient and proposing two alternating variance-reduced algorithms, ALS-SPIDER and ALS-STORM. ALS-SPIDER uses SPIDER-based variance reduction with multi-step LL updates and an auxiliary variable to estimate the hypergradient, achieving an optimal sample complexity of $O(\epsilon^{-1.5})$ under standard assumptions. ALS-STORM replaces SPIDER with STORM to reduce per-iteration batch sizes, while preserving the same $O(\epsilon^{-1.5})$ rate, requiring a large batch only in the initial iteration. Theoretical results are complemented by experiments on synthetic and data-hyper-cleaning tasks, showing practical efficiency improvements and the viability of the two-level alternating framework for bilevel problems.

Abstract

This paper studies the unconstrained nonconvex-strongly-convex bilevel optimization problem. A common approach to solving this problem is to alternately update the upper-level and lower-level variables using (biased) stochastic gradients or their variants, with the lower-level variable updated either one step or multiple steps. In this context, we propose two alternating stochastic variance-reduced algorithms, namely ALS-SPIDER and ALS-STORM, which introduce an auxiliary variable to estimate the hypergradient for updating the upper-level variable. ALS-SPIDER employs the SPIDER estimator for updating variables, while ALS-STORM is a modification of ALS-SPIDER designed to avoid using large batch sizes in every iteration. Theoretically, both algorithms can find an $ε$-stationary point of the bilevel problem with a sample complexity of $O(ε^{-1.5})$ for arbitrary constant number of lower-level variable updates. To the best of our knowledge, they are the first algorithms to achieve the optimal complexity of $O(ε^{-1.5})$ when performing multiple updates on the lower-level variable. Numerical experiments are conducted to illustrate the efficiency of our algorithms.

Alternating Stochastic Variance-Reduced Algorithms with Optimal Complexity for Bilevel Optimization

TL;DR

under standard assumptions. ALS-STORM replaces SPIDER with STORM to reduce per-iteration batch sizes, while preserving the same

rate, requiring a large batch only in the initial iteration. Theoretical results are complemented by experiments on synthetic and data-hyper-cleaning tasks, showing practical efficiency improvements and the viability of the two-level alternating framework for bilevel problems.

Abstract

-stationary point of the bilevel problem with a sample complexity of

for arbitrary constant number of lower-level variable updates. To the best of our knowledge, they are the first algorithms to achieve the optimal complexity of

when performing multiple updates on the lower-level variable. Numerical experiments are conducted to illustrate the efficiency of our algorithms.

Paper Structure (14 sections, 8 theorems, 98 equations, 2 figures, 4 algorithms)

This paper contains 14 sections, 8 theorems, 98 equations, 2 figures, 4 algorithms.

Introduction
Main contributions
Preliminaries
A SPIDER-based alternating variance-reduced algorithm
The proposed algorithm ALS-SPIDER
Theoretical Analysis
Fundamental Lemmas
Convergence and Complexity Analysis
A STORM-based alternating stochastic variance-reduced algorithm
Convergence and Complexity Analysis
Experiments
Synthetic Bilevel Problem
Data Hyper-Cleaning
Conclusion

Key Result

Lemma 1

Suppose Assumptions assum:1 and assum:2 hold. Then, $\nabla \Phi(x)$ in (eq2), $y^*(x)$ in problem (eq1), and $v^*(x)$ in (eq4) are respectively $L_{\Phi}$-, $C_{y}$-, and $C_{v}$-Lipschitz continuous, where

Figures (2)

Figure 1: Comparison of VRBO and ALS-SPIDER (in the first two columns), and ALS-SPIDER and ALS-STORM (in the last two columns) for the synthetic bilevel problem. We show the convergence performance w.r.t. outer loop iteration $k$ (resp. running time) in the first (resp. second) row.
Figure 2: Comparison of VRBO and ALS-SPIDER (in the first two columns), and ALS-SPIDER and ALS-STORM (in the last two columns) for the data hyper-cleaning task on the FashionMNIST dataset. Results w.r.t. outer iteration $k$ (resp. running time) are in the first (resp. second) row.

Theorems & Definitions (20)

Lemma 1
Definition 1
Definition 2
Remark 1
Remark 2
Lemma 2
Proof 1
Lemma 3
Proof 2
Lemma 4
...and 10 more

Alternating Stochastic Variance-Reduced Algorithms with Optimal Complexity for Bilevel Optimization

TL;DR

Abstract

Alternating Stochastic Variance-Reduced Algorithms with Optimal Complexity for Bilevel Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (20)