Synchronization of mean-field models on the circle
Yury Polyanskiy, Philippe Rigollet, Andrew Yao
TL;DR
This work develops a general, quantifiable synchronization criterion for mean-field dynamics on the circle $S^1$, expressed through the $L_1$-norm of the third derivative of the interaction function. By recasting the dynamics as gradient ascent on an energy and employing Łojasiewicz and center-stable manifold theory, the authors show that stable stationary points must be synchronized under a sharp inequality, enabling global convergence to the synchronized state. The framework is then applied to transformer-inspired self-attention dynamics, proving global synchronization for a broad range of the $eta$-parameter (including $eta o -0.16$) and identifying non-synchronization regimes for strongly negative $eta$ under certain network sizes. Extensions include normalized dynamics and generalized weighted mean-field systems, which preserve the convergence guarantees under analogous conditions. The results provide rigorous insight into clustering and oversmoothing phenomena in simplified transformer models and related mean-field systems.
Abstract
This paper considers a mean-field model of $n$ interacting particles whose state space is the unit circle, a generalization of the classical Kuramoto model. Global synchronization is said to occur if after starting from almost any initial state, all particles coalesce to a common point on the circle. We propose a general synchronization criterion in terms of $L_1$-norm of the third derivative of the particle interaction function. As an application we resolve a conjecture for the so-called self-attention dynamics (stylized model of transformers), by showing synchronization for all $β\ge -0.16$, which significantly extends the previous bound of $0\le β\le 1$ from Criscitiello, Rebjock, McRae, and Boumal (2024). We also show that global synchronization does not occur when $β< -2/3$.
