Table of Contents
Fetching ...

Bilevel gradient methods and the Morse parametric qualification condition

Jérôme Bolte, Quoc-Tung Le, Edouard Pauwels, Samuel Vaiter

TL;DR

This work introduces a Morse-parametric qualification condition for bilevel optimization, showing that generic semi-algebraic lower-level functions are piecewise parametric Morse and yield a finite-manifold structure for critical and local-minimum sets. It analyzes two gradient-based solvers: (i) a single-step multi-step alternating method that solves inner lower-level problems before a single outer step, and (ii) a differentiable programming approach that optimizes a smooth surrogate obtained by differentiating through the inner loop. The paper proves convergence of the alternating scheme to approximate bilevel solutions under the parametric Morse QC and standard regularity, while revealing a pseudo-stability phenomenon for the differentiable programming approach and highlighting potential instabilities and escape behaviors. Collectively, these results provide a principled framework for analyzing bilevel methods in nonconvex settings with multi-solution lower levels and offer guidance for meta-learning-inspired bilevel pipelines.

Abstract

We introduce the Morse parametric qualification condition for bilevel programming. Generic semi-algebraic functions are Morse parametric in a piecewise sense. Thus, bilevel programs with a Morse parametric lower level constitute a relevant intermediate class between strongly convex and fully generic lower levels. In this framework, we study bilevel gradient algorithms with two strategies: the single-step multi-step strategy, which involves a sequence of steps on the lower-level problems followed by one step on the upper-level problem, and a differentiable programming strategy that optimizes a smooth approximation of the bilevel problem. While the first is shown to be a biased gradient method on the problem with rich properties, the second, inspired by meta-learning applications, is less stable but offers simplicity and ease of implementation.

Bilevel gradient methods and the Morse parametric qualification condition

TL;DR

This work introduces a Morse-parametric qualification condition for bilevel optimization, showing that generic semi-algebraic lower-level functions are piecewise parametric Morse and yield a finite-manifold structure for critical and local-minimum sets. It analyzes two gradient-based solvers: (i) a single-step multi-step alternating method that solves inner lower-level problems before a single outer step, and (ii) a differentiable programming approach that optimizes a smooth surrogate obtained by differentiating through the inner loop. The paper proves convergence of the alternating scheme to approximate bilevel solutions under the parametric Morse QC and standard regularity, while revealing a pseudo-stability phenomenon for the differentiable programming approach and highlighting potential instabilities and escape behaviors. Collectively, these results provide a principled framework for analyzing bilevel methods in nonconvex settings with multi-solution lower levels and offer guidance for meta-learning-inspired bilevel pipelines.

Abstract

We introduce the Morse parametric qualification condition for bilevel programming. Generic semi-algebraic functions are Morse parametric in a piecewise sense. Thus, bilevel programs with a Morse parametric lower level constitute a relevant intermediate class between strongly convex and fully generic lower levels. In this framework, we study bilevel gradient algorithms with two strategies: the single-step multi-step strategy, which involves a sequence of steps on the lower-level problems followed by one step on the upper-level problem, and a differentiable programming strategy that optimizes a smooth approximation of the bilevel problem. While the first is shown to be a biased gradient method on the problem with rich properties, the second, inspired by meta-learning applications, is less stable but offers simplicity and ease of implementation.

Paper Structure

This paper contains 29 sections, 22 theorems, 75 equations, 3 figures, 1 table.

Key Result

Proposition 3.4

Given a $C^2$, semi-algebraic function $g: \mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R}$, the set of vectors $a \in \mathbb{R}^m$ such that $g(x,y) - \langle a, y\rangle$ is piecewise parametric Morse is semi-algebraic and dense in $\mathbb{R}^m$ (hence residual and of full measure).

Figures (3)

  • Figure 1: Structures of $\mathrm{crit}_1 g$ (see the definition in §\ref{['subsec:bilevel-problems']}) under \ref{['assumption:constraint-qualification']}; see also \ref{['prop:crit-local-structures']}. The decomposition of the sets of critical points and local minima of $g$ as a finite union of manifolds is represented by several branches $y^{(1)}(x), y^{(2)}(x), y^{(3)}(x)$. Saddle-point manifolds (blue) and local-minima manifolds (red) illustrate the stratification of the critical set.
  • Figure 2: Illustration of the instability phenomenon of the diagonal method. Left: inner objective $g$ and outer objective $f$, with the outcome of $k=9$ gradient steps on $g$, depending on the initialization $\mathcal{A}^k(z)$ and the corresponding approximation $\varphi^k(z) = f(\mathcal{A}^k(z))$. Middle: value of $y_l$$\varphi^k(y_l)$ along outer iterations. Many iterations have objective values corresponding to solutions of the bilevel problem \ref{['eq:original-bilevel-optim']} but tends to be attracted by the sharp global minimizer of $f$, which is repulsive for \ref{['algo:diagonal-method']}. Right: profile of $\varphi^k = f \circ \mathcal{A}^k$ for different values of $k$.
  • Figure 3: Same as \ref{['fig:numericalIllustration']} with $f(x,y)= (y - 2)^2 (left)$. We see (middle) that the recursion spends time with value $\varphi^k(y_l) \sim f(-1)$ and then converges to a point corresponding the global minimum of $f$. The specificity of this setting is that when the inner iteration counter $k$ is increasing, the corresponding argminimizer of $f \circ \mathcal{A}^k$ is pushed to infinity (right).

Theorems & Definitions (51)

  • Remark 2.1: Machine learning perspective
  • Definition 3.1: Morse and parametric Morse functions
  • Example 3.2
  • Definition 3.3: Piecewise parametric Morse functions
  • Proposition 3.4: Genericity of piecewise parametric Morse functions
  • proof : Proof of \ref{['prop:generic-ae-morse-parametric']}
  • Example 3.5
  • Proposition 3.6: Critical points and local minima are finite union of manifolds
  • proof
  • Lemma 3.7
  • ...and 41 more