An Inexact Conditional Gradient Method for Constrained Bilevel Optimization

Nazanin Abolfazli; Ruichen Jiang; Aryan Mokhtari; Erfan Yazdandoost Hamedani

An Inexact Conditional Gradient Method for Constrained Bilevel Optimization

Nazanin Abolfazli, Ruichen Jiang, Aryan Mokhtari, Erfan Yazdandoost Hamedani

TL;DR

This paper introduces a novel single-loop projection-free method employing a nested approximation technique that boasts an improved per-iteration complexity compared to existing methods but also achieves optimal convergence rate guarantees that match the best-known complexity of projection-free algorithms for solving convex constrained single-level optimization problems.

Abstract

Bilevel optimization is an important class of optimization problems where one optimization problem is nested within another. While various methods have emerged to address unconstrained general bilevel optimization problems, there has been a noticeable gap in research when it comes to methods tailored for the constrained scenario. The few methods that do accommodate constrained problems, often exhibit slow convergence rates or demand a high computational cost per iteration. To tackle this issue, our paper introduces a novel single-loop projection-free method employing a nested approximation technique. This innovative approach not only boasts an improved per-iteration complexity compared to existing methods but also achieves optimal convergence rate guarantees that match the best-known complexity of projection-free algorithms for solving convex constrained single-level optimization problems. In particular, when the hyper-objective function corresponding to the bilevel problem is convex, our method requires $\tilde{\mathcal{O}}(ε^{-1})$ iterations to find an $ε$-optimal solution. Moreover, when the hyper-objective function is non-convex, our method's complexity for finding an $ε$-stationary point is $\mathcal{O}(ε^{-2})$. To showcase the effectiveness of our approach, we present a series of numerical experiments that highlight its superior performance relative to state-of-the-art methods.

An Inexact Conditional Gradient Method for Constrained Bilevel Optimization

TL;DR

Abstract

iterations to find an

-optimal solution. Moreover, when the hyper-objective function is non-convex, our method's complexity for finding an

-stationary point is

. To showcase the effectiveness of our approach, we present a series of numerical experiments that highlight its superior performance relative to state-of-the-art methods.

Paper Structure (27 sections, 11 theorems, 62 equations, 11 figures, 3 tables, 1 algorithm)

This paper contains 27 sections, 11 theorems, 62 equations, 11 figures, 3 tables, 1 algorithm.

Introduction
Preliminaries
Motivating Examples
Assumptions and Definitions
Proposed Method
Main Algorithm
Convergence Analysis
Numerical Experiments
Matrix Completion with Denoising
Multi-Task Learning
Conclusion
Supporting Lemmas
Proof of Lemma \ref{['lem:v_b']}
Required Lemmas for Theorems \ref{['thm:convex-upper-bound']} and \ref{['thm:nonconvex-upper-bound']}
Proof of Lemma \ref{['lem:ell-es']}
...and 12 more sections

Key Result

Lemma 2.1

Suppose Assumptions assum:upper and assum:lower hold. Then for any $\mathbf{x},\Bar{\mathbf{x}} \in \mathcal{X}$, the following results hold. (I) $\|\mathbf{y}^*(\mathbf{x}) - \mathbf{y}^*(\Bar{\mathbf{x}})\| \leq \mathbf{L}_{\mathbf{y}}\| \mathbf{x} - \Bar{\mathbf{x}}\|$, where $\mathbf{L}_{\mathbf

Figures (11)

Figure 1: Performance of IBCG vs SBFW and TTSA on problem \ref{['ex:Matrix_comp']} for synthetic dataset. Plots from left to right: normalized error $(\Bar{e})$, $\|\nabla_y g(\mathbf{x}_k,\mathbf{y}_k)\|$, and $f(\mathbf{x}_k,\mathbf{y}_k)$ over time.
Figure 2: Performance of IBCG (blue) vs SBFW (red) and TTSA (yellow) on problem \ref{['ex:Matrix_comp']} for the MovieLens dataset. Plots from left to right: normalized error $(\Bar{e})$, $\|\nabla_y g(\mathbf{x}_k,\mathbf{y}_k)\|$, and $f(\mathbf{x}_k,\mathbf{y}_k)$ over time.
Figure 3: Performance of IBCG vs SBFW on problem \ref{['ex:MTL_bi']} for real dataset1 . Plots from left to right: $\|\nabla_W g(\Omega_k,W_k)\|$, and $f(\Omega_k,W_k)$ in terms of number of iterations.
Figure 4: Performance of IBCG vs SBFW on problem \ref{['ex:MTL_bi']} for real dataset1. Plots from left to right: $\|\nabla_W g(\Omega_k,W_k)\|$, and $f(\Omega_k,W_k)$ in terms of running time.
Figure 5: The performance of IBCG (blue) vs SBFW (red) on Problem \ref{['ex:toy']} when $\mu_g =1$. Plots from left to right are trajectories of $\theta_k$ and $f(\lambda_k,\theta_k)- f^*$.
...and 6 more figures

Theorems & Definitions (20)

Remark 2.1
Lemma 2.1
Definition 2.1
Lemma 4.1
Theorem 4.2: Convex bilevel
Corollary 4.3
Theorem 4.4: Non-convex bilevel
Corollary 4.5
Remark 4.1
Remark 4.2
...and 10 more

An Inexact Conditional Gradient Method for Constrained Bilevel Optimization

TL;DR

Abstract

An Inexact Conditional Gradient Method for Constrained Bilevel Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (20)