Perturbed Gradient Descent via Convex Quadratic Approximation for Nonconvex Bilevel Optimization

Nazanin Abolfazli; Sina Sharifi; Mahyar Fazlyab; Erfan Yazdandoost Hamedani

Perturbed Gradient Descent via Convex Quadratic Approximation for Nonconvex Bilevel Optimization

Nazanin Abolfazli, Sina Sharifi, Mahyar Fazlyab, Erfan Yazdandoost Hamedani

TL;DR

This paper introduces a discretized variant of RXGF and formulate convex quadratic program subproblems with closed-form solutions and provides a rigorous convergence analysis, demonstrating that under the existence of a KKT point and a regularity assumption {(lower-level gradient PL assumption)}, the method achieves an iteration complexity of 1/1 in terms of the squared norm of the KKT residual for the reformulated problem.

Abstract

Bilevel optimization is a fundamental tool in hierarchical decision-making and has been widely applied to machine learning tasks such as hyperparameter tuning, meta-learning, and continual learning. While significant progress has been made in bilevel optimization, existing methods predominantly focus on the {nonconvex-strongly convex, or the} nonconvex-PL settings, leaving the more general nonconvex-nonconvex framework underexplored. In this paper, we address this gap by developing an efficient gradient-based method inspired by the recently proposed Relaxed Gradient Flow (RXGF) framework with a continuous-time dynamic. In particular, we introduce a discretized variant of RXGF and formulate convex quadratic program subproblems with closed-form solutions. We provide a rigorous convergence analysis, demonstrating that under the existence of a KKT point and a regularity assumption {(lower-level gradient PL assumption)}, our method achieves an iteration complexity of $\mathcal{O}(1/ε^{1.5})$ in terms of the squared norm of the KKT residual for the reformulated problem. Moreover, even in the absence of the regularity assumption, we establish an iteration complexity of $\mathcal{O}(1/ε^{3})$ for the same metric. Through extensive numerical experiments on convex and nonconvex synthetic benchmarks and a hyper-data cleaning task, we illustrate the efficiency and scalability of our approach.

Perturbed Gradient Descent via Convex Quadratic Approximation for Nonconvex Bilevel Optimization

TL;DR

Abstract

Perturbed Gradient Descent via Convex Quadratic Approximation for Nonconvex Bilevel Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (20)