Table of Contents
Fetching ...

Neural McKean-Vlasov Processes: Distributional Dependence in Diffusion Processes

Haoming Yang, Ali Hasan, Yuting Ng, Vahid Tarokh

TL;DR

This work extends diffusion-based modeling by integrating distributional dependence through MV-SDEs and introduces three neural mean-field architectures (EM, IM, ML) to parameterize drift with respect to the state law $p_t$. It develops MLE-based, Brownian-bridge, and PDE-consistent estimation schemes, and reveals implicit regularization and attention-like connections arising from distributional terms. Empirical results on synthetic MV-SDEs, real-world time-series data, and generative tasks demonstrate that explicit distributional dependence can improve modeling of interacting systems and enable richer probability flows, without sacrificing performance on standard Itô-SDE tasks. The findings suggest MV-SDEs offer a flexible framework for modeling temporal data with interactions, with practical impact on time-series forecasting and density-based generative modeling.

Abstract

McKean-Vlasov stochastic differential equations (MV-SDEs) provide a mathematical description of the behavior of an infinite number of interacting particles by imposing a dependence on the particle density. As such, we study the influence of explicitly including distributional information in the parameterization of the SDE. We propose a series of semi-parametric methods for representing MV-SDEs, and corresponding estimators for inferring parameters from data based on the properties of the MV-SDE. We analyze the characteristics of the different architectures and estimators, and consider their applicability in relevant machine learning problems. We empirically compare the performance of the different architectures and estimators on real and synthetic datasets for time series and probabilistic modeling. The results suggest that explicitly including distributional dependence in the parameterization of the SDE is effective in modeling temporal data with interaction under an exchangeability assumption while maintaining strong performance for standard Itô-SDEs due to the richer class of probability flows associated with MV-SDEs.

Neural McKean-Vlasov Processes: Distributional Dependence in Diffusion Processes

TL;DR

This work extends diffusion-based modeling by integrating distributional dependence through MV-SDEs and introduces three neural mean-field architectures (EM, IM, ML) to parameterize drift with respect to the state law . It develops MLE-based, Brownian-bridge, and PDE-consistent estimation schemes, and reveals implicit regularization and attention-like connections arising from distributional terms. Empirical results on synthetic MV-SDEs, real-world time-series data, and generative tasks demonstrate that explicit distributional dependence can improve modeling of interacting systems and enable richer probability flows, without sacrificing performance on standard Itô-SDE tasks. The findings suggest MV-SDEs offer a flexible framework for modeling temporal data with interactions, with practical impact on time-series forecasting and density-based generative modeling.

Abstract

McKean-Vlasov stochastic differential equations (MV-SDEs) provide a mathematical description of the behavior of an infinite number of interacting particles by imposing a dependence on the particle density. As such, we study the influence of explicitly including distributional information in the parameterization of the SDE. We propose a series of semi-parametric methods for representing MV-SDEs, and corresponding estimators for inferring parameters from data based on the properties of the MV-SDE. We analyze the characteristics of the different architectures and estimators, and consider their applicability in relevant machine learning problems. We empirically compare the performance of the different architectures and estimators on real and synthetic datasets for time series and probabilistic modeling. The results suggest that explicitly including distributional dependence in the parameterization of the SDE is effective in modeling temporal data with interaction under an exchangeability assumption while maintaining strong performance for standard Itô-SDEs due to the richer class of probability flows associated with MV-SDEs.
Paper Structure (68 sections, 1 theorem, 61 equations, 14 figures, 17 tables, 4 algorithms)

This paper contains 68 sections, 1 theorem, 61 equations, 14 figures, 17 tables, 4 algorithms.

Key Result

Theorem 5.1

Suppose $f$ and $\varphi$ are known and fixed. Consider a mean-field architecture as described above with $f, \varphi$ known and a linear structure, i.e. $B(X_t, p_t, t) = {\textcolor{black}{\int \varphi(X_t, y) \, \mathrm{d} p_t(y)}} + {\textcolor{black}{f(X_t, t)}} .$ Further, assume that $\varph

Figures (14)

  • Figure 1: SDE sample paths of a double-well potential, where the particles (a) do not interact and (b) exhibit complex phase transitions as a result only of interaction through weak attraction.
  • Figure 2: MV-SDE sample paths with non-local dynamics (left) and discontinuities (right).
  • Figure 3: Schematic comparing neural architectures for modeling MV-SDEs. Implicit measure (IM) architecture uses a mean-field layer that represents particles as learned weights and computes the expectation under a learned change of measure; the empirical measure (EM) architecture computes the expectation with the observed particles; the marginal law (ML) architecture learns the particle density, and computes an empirical expectation with samples from the learned density.
  • Figure 4: Top row: sample paths from the different synthetic datasets. Middle row: mean squared error (MSE) of different architectures' performance (average of 10 runs) on drift estimation, under the effect of different levels of observation noise. Bottom row: Example of estimated gradient flow of Kuramoto model at terminal time. The colors correspond to the density of generated samples at terminal time.
  • Figure 5: Results for approximating sample paths containing jumps.
  • ...and 9 more figures

Theorems & Definitions (7)

  • Definition 3.1: Mean-field Layer
  • Remark 3.2
  • Theorem 5.1: Implicit Regularization
  • proof
  • Remark 5.2: Minimizing the Energy Distance
  • proof
  • proof