Table of Contents
Fetching ...

Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis

Yifan Yang, Hao Ban, Minhui Huang, Shiqian Ma, Kaiyi Ji

TL;DR

These methods are the first to completely eliminate the need for stepsize tuning, while achieving theoretical guarantees and achieve performance comparable to existing well-tuned approaches, while being more robust to the selection of initial stepsizes.

Abstract

Bilevel optimization has recently attracted considerable attention due to its abundant applications in machine learning problems. However, existing methods rely on prior knowledge of problem parameters to determine stepsizes, resulting in significant effort in tuning stepsizes when these parameters are unknown. In this paper, we propose two novel tuning-free algorithms, D-TFBO and S-TFBO. D-TFBO employs a double-loop structure with stepsizes adaptively adjusted by the "inverse of cumulative gradient norms" strategy. S-TFBO features a simpler fully single-loop structure that updates three variables simultaneously with a theory-motivated joint design of adaptive stepsizes for all variables. We provide a comprehensive convergence analysis for both algorithms and show that D-TFBO and S-TFBO respectively require $O(\frac{1}ε)$ and $O(\frac{1}ε\log^4(\frac{1}ε))$ iterations to find an $ε$-accurate stationary point, (nearly) matching their well-tuned counterparts using the information of problem parameters. Experiments on various problems show that our methods achieve performance comparable to existing well-tuned approaches, while being more robust to the selection of initial stepsizes. To the best of our knowledge, our methods are the first to completely eliminate the need for stepsize tuning, while achieving theoretical guarantees.

Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis

TL;DR

These methods are the first to completely eliminate the need for stepsize tuning, while achieving theoretical guarantees and achieve performance comparable to existing well-tuned approaches, while being more robust to the selection of initial stepsizes.

Abstract

Bilevel optimization has recently attracted considerable attention due to its abundant applications in machine learning problems. However, existing methods rely on prior knowledge of problem parameters to determine stepsizes, resulting in significant effort in tuning stepsizes when these parameters are unknown. In this paper, we propose two novel tuning-free algorithms, D-TFBO and S-TFBO. D-TFBO employs a double-loop structure with stepsizes adaptively adjusted by the "inverse of cumulative gradient norms" strategy. S-TFBO features a simpler fully single-loop structure that updates three variables simultaneously with a theory-motivated joint design of adaptive stepsizes for all variables. We provide a comprehensive convergence analysis for both algorithms and show that D-TFBO and S-TFBO respectively require and iterations to find an -accurate stationary point, (nearly) matching their well-tuned counterparts using the information of problem parameters. Experiments on various problems show that our methods achieve performance comparable to existing well-tuned approaches, while being more robust to the selection of initial stepsizes. To the best of our knowledge, our methods are the first to completely eliminate the need for stepsize tuning, while achieving theoretical guarantees.
Paper Structure (38 sections, 26 theorems, 158 equations, 2 figures, 2 tables, 2 algorithms)

This paper contains 38 sections, 26 theorems, 158 equations, 2 figures, 2 tables, 2 algorithms.

Key Result

Proposition 1

Suppose the iteration rounds to update $\{x,y,v\}$ are $\{T_1,T_2,T_3\}$ and $\{\alpha_t, \beta_t, \gamma_t\}$ are generated by alg:main_double or alg:main. For any $C_\alpha \geq \alpha_0$, $C_\beta \geq \beta_0$, $C_\gamma \geq \gamma_0$, we have

Figures (2)

  • Figure 1: Comparison with other bilevel methods. (a) Regularization selection on Covtype dataset. (b) Data hyper-cleaning on MNIST dataset.
  • Figure 2: Comparison of running time with other bilevel optimization methods.

Theorems & Definitions (48)

  • Remark 1: Extension to a tunable version with problem-parameter-free tuning coefficients.
  • Remark 2: Extension to a tunable version with problem-parameter-free tuning coefficients.
  • Definition 1
  • Definition 2
  • Remark 3
  • Proposition 1
  • Proposition 2
  • Theorem 1
  • Corollary 1
  • Proposition 3
  • ...and 38 more