Table of Contents
Fetching ...

Mean-Field Limits for Two-Layer Neural Networks Trained with Consensus-Based Optimization

William De Deyn, Michael Herty, Giovanni Samaey

TL;DR

This work investigates Consensus-Based Optimization (CBO) as a gradient-free training paradigm for two-layer neural networks and benchmarks it against Adam, introducing a hybrid Adam–CBO variant that accelerates convergence and a Multi-Task CBO variant that reduces memory overhead. It develops a mean-field framework by reformulating CBO in optimal-transport terms and analyzes both the infinite-width limit and the infinite-particle limit, proving variance decay and consensus. Empirical results on sine-approximation, MNIST, and multi-task settings demonstrate competitive performance, robustness, and scalability advantages of the proposed methods. The combination of OT-based mean-field analysis and practical CBO variants provides a principled pathway for scalable, global-optimization–oriented training of wide neural networks.

Abstract

We study Consensus-Based Optimization (CBO) for two-layer neural network training. We compare the performance of CBO against Adam on two test cases and demonstrate how a hybrid approach, combining CBO with Adam, provides faster convergence than CBO. Additionally, in the context of multi-task learning, we recast CBO into a formulation that offers less memory overhead. The CBO method allows for a mean-field limit formulation, which we couple with the mean-field limit of the neural network. To this end, we first reformulate CBO within the optimal transport framework. In the limit of infinitely many particles, we define the corresponding dynamics on the Wasserstein-over-Wasserstein space and show that the variance decreases monotonically.

Mean-Field Limits for Two-Layer Neural Networks Trained with Consensus-Based Optimization

TL;DR

This work investigates Consensus-Based Optimization (CBO) as a gradient-free training paradigm for two-layer neural networks and benchmarks it against Adam, introducing a hybrid Adam–CBO variant that accelerates convergence and a Multi-Task CBO variant that reduces memory overhead. It develops a mean-field framework by reformulating CBO in optimal-transport terms and analyzes both the infinite-width limit and the infinite-particle limit, proving variance decay and consensus. Empirical results on sine-approximation, MNIST, and multi-task settings demonstrate competitive performance, robustness, and scalability advantages of the proposed methods. The combination of OT-based mean-field analysis and practical CBO variants provides a principled pathway for scalable, global-optimization–oriented training of wide neural networks.

Abstract

We study Consensus-Based Optimization (CBO) for two-layer neural network training. We compare the performance of CBO against Adam on two test cases and demonstrate how a hybrid approach, combining CBO with Adam, provides faster convergence than CBO. Additionally, in the context of multi-task learning, we recast CBO into a formulation that offers less memory overhead. The CBO method allows for a mean-field limit formulation, which we couple with the mean-field limit of the neural network. To this end, we first reformulate CBO within the optimal transport framework. In the limit of infinitely many particles, we define the corresponding dynamics on the Wasserstein-over-Wasserstein space and show that the variance decreases monotonically.

Paper Structure

This paper contains 18 sections, 3 theorems, 71 equations, 8 figures.

Key Result

Proposition 1

The representation of the consensus point in $\mathbb{R}^{d}~\left(Eq.~cbo:consensus_point\right)$ and the barycenter in $\mathcal{P}_{2}\left( \mathbb{R}^{d} \right)~\left(Eq.~eq:barycenter\right)$ are equal in the following sense: where $d \left( \cdot, \cdot \right)$ denotes the distance function in $\mathbb{R}^{d}$ and $\mathcal{P}_{2}(\mathbb{R}^{d})$, respectively.

Figures (8)

  • Figure 1: Conceptual illustration of single-task versus Multi-Task CBO. In single-task CBO a single consensus point guides all particles toward the global minimizer of one empirical risk. In Multi-Task CBO the same particle ensemble is recycled across related tasks whose minimizers lie within the support of the common initialization $\rho^0$.
  • Figure 2: Empirical risk $\hat{R}(\bm{\theta})$ as a function of training epochs for a two-layer neural network trained with Adam and CBO. The figure displays the median empirical riks taken over 10 simulations.
  • Figure 3: Approximations of the sine function $\sin(2 \pi x)$ obtained with a two-layer neural network with $M = 100$ trained with CBO and Adam. The plot only contains a subset of the training dataset to improve the clarity.
  • Figure 4: Empirical risk $\hat{R}(\bm{\theta})$ as a function of training epochs for a two-layer neural network trained with Adam, CBO and the hybrid method (Adam + CBO) on the MNIST dataset.
  • Figure 5: The median and minimum empirical risk $\hat{R}(\bm{\theta})$ as a function of training epochs for two-layer neural networks trained with Multi-Task CBO. The median and minimum are taken over 100 different risk functions.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof