Table of Contents
Fetching ...

CB$^2$O: Consensus-Based Bi-Level Optimization

Nicolás García Trillos, Sixu Li, Konstantin Riedl, Yuhua Zhu

TL;DR

This work introduces CB2O, a derivative-free, multi-particle method for nonconvex bi-level optimization where the upper-level objective $G$ is minimized over the global lower-level minimizers $\Theta$ of $L$. CB2O constructs a consensus point $m^{G,L}_{\alpha,\beta}(\rho)$ by selecting a $\beta$-quantile of $L$ and applying a Laplace-type weighting of $G$, guiding particles toward the global bilevel minimizer $\theta_{good}^*$. The authors prove existence and regularity of the mean-field CB2O dynamics, and establish global convergence in mean-field law to $\theta_{good}^*$, using a novel quantitative quantiled Laplace principle (Q2LP) and a stability estimate for the consensus point under combined Wasserstein and $L^2$ perturbations. Extensive numerical experiments on constrained global optimization, sparse representation learning, and clustered federated learning illustrate CB2O’s practicality, efficiency, and robustness, demonstrating its potential as a principled metaheuristic for challenging bilevel problems.” wrapped with math in $...$.

Abstract

Bi-level optimization problems, where one wishes to find the global minimizer of an upper-level objective function over the globally optimal solution set of a lower-level objective, arise in a variety of scenarios throughout science and engineering, machine learning, and artificial intelligence. In this paper, we propose and investigate, analytically and experimentally, consensus-based bi-level optimization (CB$^2$O), a multi-particle metaheuristic derivative-free optimization method designed to solve bi-level optimization problems when both objectives may be nonconvex. Our method leverages within the computation of the consensus point a carefully designed particle selection principle implemented through a suitable choice of a quantile on the level of the lower-level objective, together with a Laplace principle-type approximation w.r.t. the upper-level objective function, to ensure that the bi-level optimization problem is solved in an intrinsic manner. We give an existence proof of solutions to a corresponding mean-field dynamics, for which we first establish the stability of our consensus point w.r.t. a combination of Wasserstein and $L^2$ perturbations, and consecutively resort to PDE considerations extending the classical Picard iteration to construct a solution. For such solution, we provide a global convergence analysis in mean-field law showing that the solution of the associated nonlinear nonlocal Fokker-Planck equation converges exponentially fast to the unique solution of the bi-level optimization problem provided suitable choices of the hyperparameters. The practicability and efficiency of our CB$^2$O algorithm is demonstrated through extensive numerical experiments in the settings of constrained global optimization, sparse representation learning, and robust (clustered) federated learning.

CB$^2$O: Consensus-Based Bi-Level Optimization

TL;DR

This work introduces CB2O, a derivative-free, multi-particle method for nonconvex bi-level optimization where the upper-level objective is minimized over the global lower-level minimizers of . CB2O constructs a consensus point by selecting a -quantile of and applying a Laplace-type weighting of , guiding particles toward the global bilevel minimizer . The authors prove existence and regularity of the mean-field CB2O dynamics, and establish global convergence in mean-field law to , using a novel quantitative quantiled Laplace principle (Q2LP) and a stability estimate for the consensus point under combined Wasserstein and perturbations. Extensive numerical experiments on constrained global optimization, sparse representation learning, and clustered federated learning illustrate CB2O’s practicality, efficiency, and robustness, demonstrating its potential as a principled metaheuristic for challenging bilevel problems.” wrapped with math in .

Abstract

Bi-level optimization problems, where one wishes to find the global minimizer of an upper-level objective function over the globally optimal solution set of a lower-level objective, arise in a variety of scenarios throughout science and engineering, machine learning, and artificial intelligence. In this paper, we propose and investigate, analytically and experimentally, consensus-based bi-level optimization (CBO), a multi-particle metaheuristic derivative-free optimization method designed to solve bi-level optimization problems when both objectives may be nonconvex. Our method leverages within the computation of the consensus point a carefully designed particle selection principle implemented through a suitable choice of a quantile on the level of the lower-level objective, together with a Laplace principle-type approximation w.r.t. the upper-level objective function, to ensure that the bi-level optimization problem is solved in an intrinsic manner. We give an existence proof of solutions to a corresponding mean-field dynamics, for which we first establish the stability of our consensus point w.r.t. a combination of Wasserstein and perturbations, and consecutively resort to PDE considerations extending the classical Picard iteration to construct a solution. For such solution, we provide a global convergence analysis in mean-field law showing that the solution of the associated nonlinear nonlocal Fokker-Planck equation converges exponentially fast to the unique solution of the bi-level optimization problem provided suitable choices of the hyperparameters. The practicability and efficiency of our CBO algorithm is demonstrated through extensive numerical experiments in the settings of constrained global optimization, sparse representation learning, and robust (clustered) federated learning.

Paper Structure

This paper contains 37 sections, 18 theorems, 186 equations, 8 figures, 1 algorithm.

Key Result

Theorem 2.4

Let $L \in \mathcal{C}(\mathbb{R}^d)$ and $G \in \mathcal{C}(\mathbb{R}^d)$ satisfy asm:minimizers--asm:growthBound_G. Moreover, for $l \geq 0$, let $\rho_0 \in H^{l+2}(\mathbb{R}^d) \cap L^{\infty} (\mathbb{R}^d) \cap \mathcal{P}_4(\mathbb{R}^d)$ be such that $\theta_{\mathrm{good}}^* \in \operator Moreover, $\rho_t \in \mathcal{P}_4(\mathbb{R}^d)$ and the mapping $t \mapsto m^{G,L}_{\alpha, \bet

Figures (8)

  • Figure 1: An illustration of the CB2O algorithm \ref{['eq:dyn_micro']} and its working principles for solving bi-level optimization problems of the form \ref{['eq:bilevel_opt']}. We depict a typical setting of \ref{['eq:bilevel_opt']} in two dimensions with the lower-level objective function $L$ being a Himmelblau function (plotted as contours in the $xy$-plane) and with the upper-level objective function $G$ being a parabola (plotted as a surface). The set of global minimizers of $L$, i.e., the set $\Theta$, consists of the three green dots and the one green star that are plotted on the $xy$ plane. The green star identifies the global minimizer $\theta_{\mathrm{good}}^*$ of $L$ which is optimal w.r.t. the upper-level objective function $G$ among the points in $\Theta$, and thereby is the solution to the bi-level optimization problem \ref{['eq:bilevel_opt']}. The setting in this example satisfies the assumptions made later in the paper for our theoretical analysis. At every point in time, CB2O employs $N=50$ particles (depicted as yellow points, including both circles and filled dots on the $xy$-plane) and computes the consensus point $m^{G,L}_{\alpha, \beta}(\rho^N_t)$, which all particles are attracted to as they explore the space through noise (not depicted here). In order to compute the aforementioned consensus point $m^{G,L}_{\alpha, \beta}(\rho^N_t)$, each particle first evaluates the lower-level objective function $L$. For a chosen quantile parameter $\beta=0.25$, the $\beta N$ w.r.t. $L$ best positioned particles (depicted as the yellow filled dots) are selected. They belong to the set $Q^L_{\beta}[\rho^N_t]$ defined in \ref{['eq:Qbeta_FiniteParticles']}, which serves as an approximation to the set $\Theta$ of global minimizers of $L$. Based on those points, the consensus point $m^{G,L}_{\alpha, \beta}(\rho^N_t)$ is computed, while the remaining particles (depicted as the yellow circles) are discarded from the computation of the consensus point. Figuratively speaking, those particles that belong to the quantile set $Q^L_{\beta}[\rho^N_t]$ are lifted to the upper-level objective function $G$, which they evaluate in a second step. For a chosen weight parameter $\alpha=10$, the consensus point $m^{G,L}_{\alpha, \beta}(\rho^N_t)$ is then computed as in \ref{['eq:ConsensusPoint_FiniteParticles']}. It approximates the global minimizer $\theta_{\mathrm{good}}^*$ of the bi-level optimization problem \ref{['eq:bilevel_opt']}.
  • Figure 2: Illustration of Assumptions \ref{['asm:icpL']} and \ref{['asm:icpG']} on the lower-level objective function $L$ and the upper level objective function $G$.
  • Figure 3: The influence of the hyperparameter $\beta$. By decreasing the value of $\beta$ from left to right, we shrink the quantile set $Q^L_{\beta}[\varrho]$ (illustrated as yellow shadows on the $xy$ plane and projected onto the upper-level objective), thereby ensuring an increasingly finer approximation of the set $\Theta$ of global minimizers of $L$.
  • Figure 4: The influence of the hyperparameter $\alpha$. By increasing the value of $\alpha$ from left to right, we improve on the approximation of $\tilde{\theta}_{\mathrm{good}}$, i.e., how well $m^{G,L}_{\alpha, \beta}(\varrho)$, depicted as the orange dot in all three plots, eventually approximates $\tilde{\theta}_{\mathrm{good}}$.
  • Figure 5: An illustration that the consensus point $m^{G,L}_{\alpha, \beta}(\varrho)$ as defined in \ref{['eq:consensus_point']} cannot be stable w.r.t. Wasserstein perturbations of the measure $\varrho$. For this counterexample, consider the setting where the objective functions are $L(\theta) = \left\|{\theta}\right\|_2$ and $G(\theta) = \left\|{\theta}\right\|_2$. Moreover, let us set the quantile hyperparameter $\beta = 0.3$. On the left, in Figure \ref{['fig:counter_example_left']} we depict the measure $\varrho$ together with its corresponding consensus point $m^{G,L}_{\alpha, \beta}(\varrho)$. The two concentric circles are level sets of the function $L$ and we assume that the mass of measure $\varrho$ is equally split between the two circles and uniformly distributed over the circles. By construction, these two level sets correspond to the function values $\frac{2}{\beta} \int_{\beta/2}^{\beta} q_a^{L} da$ and $\frac{2}{\beta} \int_{\beta/2}^{\beta} q_a^{L} da + \delta_q$, respectively, where $q_a^L$ is the same quantile value for $\varrho$ or $\tilde{\varrho}$, since $a \leq \beta=0.3$. Consequently, if $R$ is set to be large enough, the measure $I^L_{\beta}[\varrho]$ coincides with $\varrho$ and thus, based on \ref{['eq:consensus_point']}, the consensus point $m^{G,L}_{\alpha, \beta}(\varrho)$ lies at the center of the circles. On the right, in Figure \ref{['fig:counter_example_right']}, we construct the measure $\widetilde{\varrho}$ and plot it together with its corresponding consensus point $m^{G,L}_{\alpha, \beta}(\widetilde{\varrho})$. The perturbed measure $\widetilde{\varrho}$ is obtained by shifting a quarter of the total mass of the measure $\varrho$ from the right side of the outer circle to the right by a distance of $s$ (depicted in blue in Figure \ref{['fig:counter_example_right']}). By definition, the measure $I^L_{\beta}[\widetilde{\varrho}]$ now includes the mass on the inner circle as well as the mass on the left side of the outer circle, but not the mass that has been shifted to the right. Consequently, the consensus point $m^{G,L}_{\alpha, \beta}(\widetilde{\varrho})$ shifts to the left, as shown in Figure \ref{['fig:counter_example_right']}. The size $c_\alpha$ of this shift, however, is independent of $s$.
  • ...and 3 more figures

Theorems & Definitions (54)

  • Remark 1.1: Mean-field approximation for CB2O
  • Remark 1.2: Choice of the hyperparameter $\alpha$ in CBO-type methods
  • Remark 1.3: CB2O algorithm with additional gradient drift w.r.t. $L$
  • Remark 2.1
  • Definition 2.2: Weak solution to Fokker-Planck Equation \ref{['eq:fokker_planck']}
  • Theorem 2.4: Existence of regular solutions to the mean-field CB2O dynamics \ref{['eq:dyn_macro']} and \ref{['eq:fokker_planck']}
  • Remark 2.5
  • Theorem 2.7: Convergence of the mean-field CB2O dynamics \ref{['eq:dyn_macro']} and \ref{['eq:fokker_planck']}
  • Remark 2.8: Choice of $\delta_q$
  • Remark 2.9: Choice of hyperparameter $\beta$
  • ...and 44 more