A Hybrid Subgradient Method for Nonsmooth Nonconvex Bilevel Optimization
Nachuan Xiao, Xiaoyin Hu, Xin Liu, Kim-Chuan Toh
TL;DR
The paper tackles nonsmooth, nonconvex bilevel optimization by recasting the problem as a constrained optimization over the LL stationary manifold and exploiting mild regularity through LICQ-like conditions. It develops a two-timescale momentum-accelerated subgradient method (TMG) for local convergence, and an FRG feasibility restoration scheme to achieve global convergence, culminating in a novel hybrid algorithm that adaptively switches between phases. The analysis leverages a conservative-field framework and an exact penalty function $h_{\beta}$ to connect BLO to a single-level problem and to establish convergence of the interpolated processes to first-order stationary points. The results are supported by preliminary numerical experiments showing improved convergence speed and solution quality over existing methods, highlighting the practical potential for large-scale nonsmooth BLOs. The approach offers a scalable, Hessian-free framework with solid theoretical guarantees and broad applicability to problems in learning, optimization, and hyperparameter tuning.
Abstract
In this paper, we focus on the nonconvex-nonconvex bilevel optimization problem (BLO), where both upper-level and lower-level objectives are nonconvex, with the upper-level problem potentially being nonsmooth. We develop a two-timescale momentum-accelerated subgradient method (TMG) that employs two-timescale stepsizes, and establish its local convergence when initialized within a sufficiently small neighborhood of the feasible region. To develop a globally convergent algorithm for (BLO), we introduce a feasibility restoration scheme (FRG) that drives iterates toward the feasible region. Both (TMG) and (FRG) only require the first-order derivatives of the upper-level and lower-level objective functions, ensuring efficient computations in practice. We then develop a novel hybrid method that alternates between (TMG) and (FRG) and adaptively estimates its hyperparameters. Under mild conditions, we establish the global convergence properties of our proposed algorithm. Preliminary numerical experiments demonstrate the high efficiency and promising potential of our proposed algorithm.
