Table of Contents
Fetching ...

Double Momentum Method for Lower-Level Constrained Bilevel Optimization

Wanli Shi, Yi Chang, Bin Gu

TL;DR

A new hypergradient of LCBO is proposed leveraging the theory of nonsmooth implicit function theorem instead of using the restrive assumptions and a single-loop single-timescale algorithm is proposed that can return a $(delta, \epsilon)-stationary point with $\tilde{\mathcal{O}}(d_2^2\epsilon^{-4})$ iterations.

Abstract

Bilevel optimization (BO) has recently gained prominence in many machine learning applications due to its ability to capture the nested structure inherent in these problems. Recently, many hypergradient methods have been proposed as effective solutions for solving large-scale problems. However, current hypergradient methods for the lower-level constrained bilevel optimization (LCBO) problems need very restrictive assumptions, namely, where optimality conditions satisfy the differentiability and invertibility conditions and lack a solid analysis of the convergence rate. What's worse, existing methods require either double-loop updates, which are sometimes less efficient. To solve this problem, in this paper, we propose a new hypergradient of LCBO leveraging the theory of nonsmooth implicit function theorem instead of using the restrive assumptions. In addition, we propose a \textit{single-loop single-timescale} algorithm based on the double-momentum method and adaptive step size method and prove it can return a $(δ, ε)$-stationary point with $\tilde{\mathcal{O}}(d_2^2ε^{-4})$ iterations. Experiments on two applications demonstrate the effectiveness of our proposed method.

Double Momentum Method for Lower-Level Constrained Bilevel Optimization

TL;DR

A new hypergradient of LCBO is proposed leveraging the theory of nonsmooth implicit function theorem instead of using the restrive assumptions and a single-loop single-timescale algorithm is proposed that can return a \tilde{\mathcal{O}}(d_2^2\epsilon^{-4})$ iterations.

Abstract

Bilevel optimization (BO) has recently gained prominence in many machine learning applications due to its ability to capture the nested structure inherent in these problems. Recently, many hypergradient methods have been proposed as effective solutions for solving large-scale problems. However, current hypergradient methods for the lower-level constrained bilevel optimization (LCBO) problems need very restrictive assumptions, namely, where optimality conditions satisfy the differentiability and invertibility conditions and lack a solid analysis of the convergence rate. What's worse, existing methods require either double-loop updates, which are sometimes less efficient. To solve this problem, in this paper, we propose a new hypergradient of LCBO leveraging the theory of nonsmooth implicit function theorem instead of using the restrive assumptions. In addition, we propose a \textit{single-loop single-timescale} algorithm based on the double-momentum method and adaptive step size method and prove it can return a -stationary point with iterations. Experiments on two applications demonstrate the effectiveness of our proposed method.

Paper Structure

This paper contains 30 sections, 15 theorems, 113 equations, 3 figures, 5 tables, 1 algorithm.

Key Result

Lemma 3.3

Under Assumptions assump:lower_level, we have the optimal solution to the lower-level problem is Lipschitz continuous with constant ${L_{g}}/{\mu_g}$.

Figures (3)

  • Figure 1: Test accuracy against training time of all the methods in data hyper-cleaning.
  • Figure 2: Test accuracy against training time of all the methods in meta-learning.
  • Figure 3: Route map of convergence analysis.

Theorems & Definitions (32)

  • Definition 3.1
  • Definition 3.2
  • Lemma 3.3
  • Definition 3.4
  • Definition 3.5
  • Lemma 3.6
  • Proposition 3.7
  • Lemma 3.8
  • Proposition 3.9
  • Lemma 3.10
  • ...and 22 more