Table of Contents
Fetching ...

Learning interacting particle systems from unlabeled data

Viska Wei, Fei Lu

Abstract

Learning the potentials of interacting particle systems is a fundamental task across various scientific disciplines. A major challenge is that unlabeled data collected at discrete time points lack trajectory information due to limitations in data collection methods or privacy constraints. We address this challenge by introducing a trajectory-free self-test loss function that leverages the weak-form stochastic evolution equation of the empirical distribution. The loss function is quadratic in potentials, supporting parametric and nonparametric regression algorithms for robust estimation that scale to large, high-dimensional systems with big data. Systematic numerical tests show that our method outperforms baseline methods that regress on trajectories recovered via label matching, tolerating large observation time steps. We establish the convergence of parametric estimators as the sample size increases, providing a theoretical foundation for the proposed approach.

Learning interacting particle systems from unlabeled data

Abstract

Learning the potentials of interacting particle systems is a fundamental task across various scientific disciplines. A major challenge is that unlabeled data collected at discrete time points lack trajectory information due to limitations in data collection methods or privacy constraints. We address this challenge by introducing a trajectory-free self-test loss function that leverages the weak-form stochastic evolution equation of the empirical distribution. The loss function is quadratic in potentials, supporting parametric and nonparametric regression algorithms for robust estimation that scale to large, high-dimensional systems with big data. Systematic numerical tests show that our method outperforms baseline methods that regress on trajectories recovered via label matching, tolerating large observation time steps. We establish the convergence of parametric estimators as the sample size increases, providing a theoretical foundation for the proposed approach.

Paper Structure

This paper contains 43 sections, 8 theorems, 105 equations, 7 figures, 7 tables, 2 algorithms.

Key Result

Theorem 4.2

Under Assumption ass:parametric_theory, for every $\eta\in(0,1)$, we have with probability at least $1-\eta$, provided $M\ge M_0/\eta$ and $\Delta t\le \Delta_0$ for some constants $C, M_0$ and $\Delta_0$ independent of $M$ and $\Delta t$. $\blacktriangleleft$$\blacktriangleleft$

Figures (7)

  • Figure 1: Workflow of both estimation algorithms using the self-test loss function. Left: Least squares regression expands $V$ and $\Phi$ in prescribed basis functions (or RBFs), and solves the minimizer via least squares with regularization. Right: Neural network regression parameterizes $V_\theta$ and $\Phi_\theta$ by neural networks, computes derivatives via automatic differentiation, and minimizes the loss using stochastic gradient descent.
  • Figure 2: $M$-scaling on the reference model under Riemann-sum (left pair) and trapezoidal (right pair) time integrations. Both with data generated with $\delta t = 10^{-4}$ and integrated with the various $\Delta t$ values. The Riemann-sum has error bound $O(\Delta t + M^{-1/2})$, so the four $\Delta t$ values track the $O(M^{-1/2})$ rate (green line) until they saturate at an $O(\Delta t)$ floor (left pair), while the trapezoidal rule has error bound $O((\Delta t)^2 + M^{-1/2})$, so all four $\Delta t$ values track the $O(M^{-1/2})$ rate without saturation (right pair). The mean and standard deviation are computed over 10 trials per point; detailed numbers in Table \ref{['tab:M_scaling']}.
  • Figure 3: Convergence for discrete-time model (i.e., $\delta t = \Delta t$, zero gap). The intrinsic $O(\Delta t)$ bias of the discrete-time model (see Section \ref{['sec:discretization']}) dominates the error when $\Delta t$ is large, and the convergence in $M$ is only visible at $\Delta t \le 10^{-3}$ (and $\Delta t= 10^{-4}$ for $\Phi$) where the bias is small enough to allow the $O(M^{-1/2})$ statistical error to dominate. See also in Table \ref{['tab:M_scaling']}.
  • Figure 4: Condition numbers scaling with $N$ for the normal matrix and its diagonal blocks ($\kappa_*$, $\kappa_{VV}$, $\kappa_{\Phi\Phi}$). Slopes (italic numbers at $N{=}100$) are log-log regression rates from Table \ref{['tab:cond_number']}. Dashed/dotted green lines show the $O(N)$ theoretical rates. Left: $\kappa_*$ increases in $N$, with a rate near $O(N)$ when $d=10$. Center: $\kappa_{VV}$ is $N$-independent. Right: $\kappa_{\Phi\Phi}$ has a rate near $O(N)$ when $d=10$ due to distance concentration.
  • Figure 5: Non-radial potential recovery ($d{=}2$). Percentages are relative $L^2(\rho)$ errors.
  • ...and 2 more figures

Theorems & Definitions (14)

  • Remark 2.1: Negative minimum of the loss
  • Remark 2.2: Numerical integration in time
  • Remark 2.3: Mean-field limit of the loss function
  • Remark 2.4: Energy balance-based loss function.
  • Theorem 4.2: Error bound
  • Corollary 4.3: Trapezoidal error bound
  • Lemma B.1: Continuum expectation normal equation
  • Lemma B.2: Interaction coercivity
  • Proposition B.3: Coercivity for joint estimation
  • Remark B.4: Negative minimum
  • ...and 4 more