Universal Approximation of Mean-Field Models via Transformers
Shiba Biswal, Karthik Elamvazhuthi, Rishi Sonthalia
TL;DR
The paper addresses learning and simulating mean-field dynamics of permutation-equivariant particle systems using transformers. It introduces the concept of an expected transformer to lift finite-sequence models to the space of measures and proves universal approximation bounds for the mean-field vector field, linking finite-particle learning to infinite-dimensional dynamics. Empirical results on Cucker-Smale and fish milling data show transformers outperform baselines in learning the vector field and generalize to more particles, while theory guarantees convergence of the transformer-augmented dynamics to true mean-field evolution. The work advances data-driven modeling of collective behavior by providing rigorous links between finite transformer approximation and continuum mean-field dynamics, with implications for physics, biology, and engineering applications.
Abstract
This paper investigates the use of transformers to approximate the mean-field dynamics of interacting particle systems exhibiting collective behavior. Such systems are fundamental in modeling phenomena across physics, biology, and engineering, including opinion formation, biological networks, and swarm robotics. The key characteristic of these systems is that the particles are indistinguishable, leading to permutation-equivariant dynamics. First, we empirically demonstrate that transformers are well-suited for approximating a variety of mean field models, including the Cucker-Smale model for flocking and milling, and the mean-field system for training two-layer neural networks. We validate our numerical experiments via mathematical theory. Specifically, we prove that if a finite-dimensional transformer effectively approximates the finite-dimensional vector field governing the particle system, then the $L_2$ distance between the \textit{expected transformer} and the infinite-dimensional mean-field vector field can be uniformly bounded by a function of the number of particles observed during training. Leveraging this result, we establish theoretical bounds on the distance between the true mean-field dynamics and those obtained using the transformer.
