Table of Contents
Fetching ...

Self-interacting approximation to McKean-Vlasov long-time limit: a Markov chain Monte Carlo method

Kai Du, Zhenjie Ren, Florin Suciu, Songbo Wang

TL;DR

The paper introduces a self-interacting diffusion as a scalable proxy for the long-time behavior of non-degenerate McKean–Vlasov dynamics, replacing the mean-field interaction with an exponentially weighted occupation measure. It proves exponential ergodicity of the self-interacting process via a reflection-coupling argument and provides quantitative bounds showing that, in the gradient setting, the SI stationary distribution closely approximates the MKV invariant measure as the interaction rate $\lambda$ decreases. A broad class of dynamics is identified for which these results hold, and a concrete Curie–Weiss/ferromagnetic example illustrates the methodology. The numerical application to training two-layer neural networks demonstrates a practical, single-particle mean-field approach with an annealing scheme that outperforms fixed-parameter runs, highlighting the method's potential for scalable learning in high-dimensional systems.

Abstract

For a certain class of McKean-Vlasov processes, we introduce proxy processes that substitute the mean-field interaction with self-interaction, employing a weighted occupation measure. Our study encompasses two key achievements. First, we demonstrate the ergodicity of the self-interacting dynamics, under broad conditions, by applying the reflection coupling method. Second, in scenarios where the drifts are negative intrinsic gradients of convex mean-field potential functionals, we use entropy and functional inequalities to demonstrate that the stationary measures of the self-interacting processes approximate the invariant measures of the corresponding McKean-Vlasov processes. As an application, we show how to learn the optimal weights of a two-layer neural network by training a single neuron.

Self-interacting approximation to McKean-Vlasov long-time limit: a Markov chain Monte Carlo method

TL;DR

The paper introduces a self-interacting diffusion as a scalable proxy for the long-time behavior of non-degenerate McKean–Vlasov dynamics, replacing the mean-field interaction with an exponentially weighted occupation measure. It proves exponential ergodicity of the self-interacting process via a reflection-coupling argument and provides quantitative bounds showing that, in the gradient setting, the SI stationary distribution closely approximates the MKV invariant measure as the interaction rate decreases. A broad class of dynamics is identified for which these results hold, and a concrete Curie–Weiss/ferromagnetic example illustrates the methodology. The numerical application to training two-layer neural networks demonstrates a practical, single-particle mean-field approach with an annealing scheme that outperforms fixed-parameter runs, highlighting the method's potential for scalable learning in high-dimensional systems.

Abstract

For a certain class of McKean-Vlasov processes, we introduce proxy processes that substitute the mean-field interaction with self-interaction, employing a weighted occupation measure. Our study encompasses two key achievements. First, we demonstrate the ergodicity of the self-interacting dynamics, under broad conditions, by applying the reflection coupling method. Second, in scenarios where the drifts are negative intrinsic gradients of convex mean-field potential functionals, we use entropy and functional inequalities to demonstrate that the stationary measures of the self-interacting processes approximate the invariant measures of the corresponding McKean-Vlasov processes. As an application, we show how to learn the optimal weights of a two-layer neural network by training a single neuron.
Paper Structure (19 sections, 10 theorems, 210 equations, 1 figure, 1 algorithm)

This paper contains 19 sections, 10 theorems, 210 equations, 1 figure, 1 algorithm.

Key Result

Theorem 1

Suppose Assumption assu:si-contraction hold. Let $(X_t, m_t)_{t \geqslant 0}$, $(X'_t, m'_t)_{t \geqslant 0}$ be two processes following the dynamics eq:si for some $\lambda > 0$ such that the first marginals of their initial values $X_0$, $X'_0$ have finite first moments. Define the following metri and denote the corresponding Wasserstein distance on $\mathcal{P}_1\bigl(\mathds R^d \times \mathca

Figures (1)

  • Figure 1: Averaged over 100 repetitions losses for fixed values of $\lambda$ and for discrete annealing.

Theorems & Definitions (30)

  • Definition 1: Modulus of continuity
  • Definition 2: Semi-monotonicity
  • Theorem 1
  • Remark 1: On the assumption
  • Remark 2: Rate of convergence
  • Example 1: Two-body interaction
  • Example 2: $\mathcal{C}^1$ functional
  • Definition 3
  • Corollary 2
  • Remark 3
  • ...and 20 more