Table of Contents
Fetching ...

Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling

Junyi Li, Heng Huang

TL;DR

This work introduces a without-replacement sampling based algorithm which achieves a faster convergence rate compared to its counterparts that rely on independent sampling and extends the discussion to conditional bilevel optimization and two special cases: minimax and compositional optimization.

Abstract

Bilevel Optimization has experienced significant advancements recently with the introduction of new efficient algorithms. Mirroring the success in single-level optimization, stochastic gradient-based algorithms are widely used in bilevel optimization. However, a common limitation in these algorithms is the presumption of independent sampling, which can lead to increased computational costs due to the complicated hyper-gradient formulation of bilevel problems. To address this challenge, we study the example-selection strategy for bilevel optimization in this work. More specifically, we introduce a without-replacement sampling based algorithm which achieves a faster convergence rate compared to its counterparts that rely on independent sampling. Beyond the standard bilevel optimization formulation, we extend our discussion to conditional bilevel optimization and also two special cases: minimax and compositional optimization. Finally, we validate our algorithms over both synthetic and real-world applications. Numerical results clearly showcase the superiority of our algorithms.

Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling

TL;DR

This work introduces a without-replacement sampling based algorithm which achieves a faster convergence rate compared to its counterparts that rely on independent sampling and extends the discussion to conditional bilevel optimization and two special cases: minimax and compositional optimization.

Abstract

Bilevel Optimization has experienced significant advancements recently with the introduction of new efficient algorithms. Mirroring the success in single-level optimization, stochastic gradient-based algorithms are widely used in bilevel optimization. However, a common limitation in these algorithms is the presumption of independent sampling, which can lead to increased computational costs due to the complicated hyper-gradient formulation of bilevel problems. To address this challenge, we study the example-selection strategy for bilevel optimization in this work. More specifically, we introduce a without-replacement sampling based algorithm which achieves a faster convergence rate compared to its counterparts that rely on independent sampling. Beyond the standard bilevel optimization formulation, we extend our discussion to conditional bilevel optimization and also two special cases: minimax and compositional optimization. Finally, we validate our algorithms over both synthetic and real-world applications. Numerical results clearly showcase the superiority of our algorithms.

Paper Structure

This paper contains 19 sections, 17 theorems, 93 equations, 4 figures, 2 tables, 5 algorithms.

Key Result

Theorem 4.6

Suppose Assumptions assumption:f_smoothness-assumption:bounded_diff2 are satisfied, and Assumption assumption:ave-grad-err is satisfied with some $\alpha$ and $C$ for the example order we use in Algorithm alg:BiO. We choose learning rates $\eta_t = \eta$, $\gamma_t = c_1 \eta$, $\rho_t = c_2 \eta$ a where $c_1$, $c_2$, $\tilde{L}_0$, $\tilde{L}$ are constants related to the smoothness parameters o

Figures (4)

  • Figure 1: Comparison of different sampling strategies for the Invariant Risk-Minimization task.
  • Figure 2: Comparison of different algorithms for the Hyper-data Cleaning task. The top two plots show validation error/F1 score vs Number of Hyper-Iterations and the bottom two plots show validation error/F1 score vs Running Time. The fraction of the noisy samples is 0.6.
  • Figure 3: Comparison of different algorithms for the Hyper-Representation Task over the Omniglot Dataset. From Left to Right: 5-way-1-shot, 5-way-5-shot, 20-way-1-shot, 20-way-5-shot.
  • Figure 4: Comparison of different algorithms for the Hyper-Representation Task over the MiniImageNet Dataset. From Left to Right: 5-way-1-shot, 5-way-5-shot, 20-way-1-shot, 20-way-5-shot.

Theorems & Definitions (29)

  • Theorem 4.6
  • Theorem 4.8
  • Proposition B.7
  • Lemma B.8
  • proof
  • Lemma B.9
  • proof
  • Lemma B.10
  • proof
  • Lemma B.11
  • ...and 19 more