Table of Contents
Fetching ...

A Correspondence-Driven Approach for Bilevel Decision-making with Nonconvex Lower-Level Problems

Xiaotian Jiang, Jiaxiang Li, Mingyi Hong, Shuzhong Zhang

TL;DR

This work tackles bilevel optimization with nonconvex lower-level objectives by replacing the traditional rational-follower assumption with a correspondence-driven hyperfunction φ^{cd} that reflects algorithmic, bounded rationality. To handle discontinuities, it introduces Gaussian smoothing φ^{cd}_ξ, and proves convergence of the smoothed value and gradient to their nondiscretized counterparts at appropriate points. It then develops SCiNBiO, a biased SGD-based method using a cubic-regularized Newton lower-level solver, and provides convergence and oracle-complexity guarantees, including a fold-bifurcation-based refinement of lower-level complexity. The framework leverages a prevalence-based perspective on regularity to address bifurcation phenomena, linking dynamical-systems fold bifurcations to the geometry of the lower-level landscape. Experiments demonstrate SCiNBiO’s robustness against nonconvexity and its superiority over competing BLO methods in both minimax and hyperparameter-optimization tasks.

Abstract

We consider bilevel optimization problems with general nonconvex lower-level objectives and show that the classical hyperfunction-based formulation is unsettled, since the global minimizer of the lower-level problem is generally unattainable. To address this issue, we propose a correspondence-driven hyperfunction $φ^{\text{cd}}$. In this formulation, the follower is modeled not as a rational agent always attaining a global minimizer, but as an algorithm-based bounded rational agent whose decisions are produced by a fixed algorithm with initialization and step size. Since $φ^{\text{cd}}$ is generally discontinuous, we apply Gaussian smoothing to obtain a smooth approximation $φ^{\text{cd}}_ξ$, then show that its value and gradient converge to those of $φ^{\text{cd}}$. In the nonconvex setting, we identify that bifurcation phenomena, which arise when $g(x,\cdot)$ has a degenerate stationary point, pose a key challenge for hyperfunction-based methods. This is especially the case when $φ^{\text{cd}}_ξ$ is solved using gradient methods. To overcome this challenge, we analyze the geometric structure of the bifurcation set under some weak assumptions. Building on these results, we design a biased projected SGD-based algorithm SCiNBiO to solve $φ^{\text{cd}}_ξ$ with a cubic-regularized Newton lower-level solver. We also provide convergence guarantees and oracle complexity bounds for the upper level. Finally, we connect bifurcation theory from dynamical systems to the bilevel setting and define the notion of fold bifurcation points in this setting. Under the assumption that all degenerate stationary points are fold bifurcation points, we establish the oracle complexity of SCiNBiO for the lower-level problem.

A Correspondence-Driven Approach for Bilevel Decision-making with Nonconvex Lower-Level Problems

TL;DR

This work tackles bilevel optimization with nonconvex lower-level objectives by replacing the traditional rational-follower assumption with a correspondence-driven hyperfunction φ^{cd} that reflects algorithmic, bounded rationality. To handle discontinuities, it introduces Gaussian smoothing φ^{cd}_ξ, and proves convergence of the smoothed value and gradient to their nondiscretized counterparts at appropriate points. It then develops SCiNBiO, a biased SGD-based method using a cubic-regularized Newton lower-level solver, and provides convergence and oracle-complexity guarantees, including a fold-bifurcation-based refinement of lower-level complexity. The framework leverages a prevalence-based perspective on regularity to address bifurcation phenomena, linking dynamical-systems fold bifurcations to the geometry of the lower-level landscape. Experiments demonstrate SCiNBiO’s robustness against nonconvexity and its superiority over competing BLO methods in both minimax and hyperparameter-optimization tasks.

Abstract

We consider bilevel optimization problems with general nonconvex lower-level objectives and show that the classical hyperfunction-based formulation is unsettled, since the global minimizer of the lower-level problem is generally unattainable. To address this issue, we propose a correspondence-driven hyperfunction . In this formulation, the follower is modeled not as a rational agent always attaining a global minimizer, but as an algorithm-based bounded rational agent whose decisions are produced by a fixed algorithm with initialization and step size. Since is generally discontinuous, we apply Gaussian smoothing to obtain a smooth approximation , then show that its value and gradient converge to those of . In the nonconvex setting, we identify that bifurcation phenomena, which arise when has a degenerate stationary point, pose a key challenge for hyperfunction-based methods. This is especially the case when is solved using gradient methods. To overcome this challenge, we analyze the geometric structure of the bifurcation set under some weak assumptions. Building on these results, we design a biased projected SGD-based algorithm SCiNBiO to solve with a cubic-regularized Newton lower-level solver. We also provide convergence guarantees and oracle complexity bounds for the upper level. Finally, we connect bifurcation theory from dynamical systems to the bilevel setting and define the notion of fold bifurcation points in this setting. Under the assumption that all degenerate stationary points are fold bifurcation points, we establish the oracle complexity of SCiNBiO for the lower-level problem.

Paper Structure

This paper contains 35 sections, 22 theorems, 203 equations, 15 figures, 1 table, 1 algorithm.

Key Result

Theorem 2.1

For any smooth function $g(x,y)$, we uniformly choose $a$ from $[-\nu,\nu]^m$, where $\nu>0$ can be any constant. We perturb $g(x,y)$ as follows Then, with probability one (i.e., almost surely with respect to the random choice of $a$), $\widetilde{g}_a(x,y)$ is Morse in $y$ for almost every$x$.

Figures (15)

  • Figure 1: Graph of $g(0,y)$ with three non-degenerate stationary points
  • Figure 2: Graph of $g(1,y)$ with one non-degenerate stationary point
  • Figure 3: Illustration of the lower-level objective $g(x,y)=(y-x)^4-2(y-x)^2$ when $x=0$. The function has two symmetric minimizers and a saddle point at $y=0$. With fixed initialization $y_0=0$ and gradient descent, small changes in $x$ lead to different accumulation points, resulting in discontinuity in $y^{\text{cd}}(x)$ and hence in $\phi^{\text{cd}}(x)$.
  • Figure 4: Gaussian smoothing of the piecewise-defined function $f(x)=-x$ for $x\leq 0$, and $f(x)=1$ for $x>0$, using a Gaussian kernel with $\xi=0.05$. The original function is discontinuous at $x=0$, while the smoothed approximation $f_\xi(x)$ is smooth and closely follows $f(x)$ away from the discontinuity. The minimizer of the smoothed function approximates the left-sided minimum of the original function.
  • Figure 5: Visualization of the bifurcation point set $\widetilde{\mathcal{X}}$ of the function $g(x,y)=y^4+(x_1^2 - 5x_1x_2 + 2x_2^2 - 7x_1 + 8x_2 - 30)y^3 +(x_1^2 - 3x_1x_2 + 4x_2^2 - 5x_1 + 2x_2 - 40)y^2 +(x_1^2 - 5x_1x_2 + 2x_2^2 - 7x_1 + 8x_2 - 30)y.$ over the domain $[-4,5]^2$. The set exhibits a finite stratified manifold structure.
  • ...and 10 more figures

Theorems & Definitions (44)

  • Example 2.1
  • Definition 2.1: Prevalence hunt1992prevalence
  • Definition 2.2: Morse function
  • Theorem 2.1
  • Remark 2.1: Two viewpoints on “prevalent”
  • Definition 2.3: Bifurcation Point Set
  • Remark 2.2
  • Definition 3.1: Correspondence-driven Hyperfunction
  • Remark 3.1
  • Example 3.1
  • ...and 34 more