Table of Contents
Fetching ...

Synchronization of mean-field models on the circle

Yury Polyanskiy, Philippe Rigollet, Andrew Yao

TL;DR

This work develops a general, quantifiable synchronization criterion for mean-field dynamics on the circle $S^1$, expressed through the $L_1$-norm of the third derivative of the interaction function. By recasting the dynamics as gradient ascent on an energy and employing Łojasiewicz and center-stable manifold theory, the authors show that stable stationary points must be synchronized under a sharp inequality, enabling global convergence to the synchronized state. The framework is then applied to transformer-inspired self-attention dynamics, proving global synchronization for a broad range of the $eta$-parameter (including $eta o -0.16$) and identifying non-synchronization regimes for strongly negative $eta$ under certain network sizes. Extensions include normalized dynamics and generalized weighted mean-field systems, which preserve the convergence guarantees under analogous conditions. The results provide rigorous insight into clustering and oversmoothing phenomena in simplified transformer models and related mean-field systems.

Abstract

This paper considers a mean-field model of $n$ interacting particles whose state space is the unit circle, a generalization of the classical Kuramoto model. Global synchronization is said to occur if after starting from almost any initial state, all particles coalesce to a common point on the circle. We propose a general synchronization criterion in terms of $L_1$-norm of the third derivative of the particle interaction function. As an application we resolve a conjecture for the so-called self-attention dynamics (stylized model of transformers), by showing synchronization for all $β\ge -0.16$, which significantly extends the previous bound of $0\le β\le 1$ from Criscitiello, Rebjock, McRae, and Boumal (2024). We also show that global synchronization does not occur when $β< -2/3$.

Synchronization of mean-field models on the circle

TL;DR

This work develops a general, quantifiable synchronization criterion for mean-field dynamics on the circle , expressed through the -norm of the third derivative of the interaction function. By recasting the dynamics as gradient ascent on an energy and employing Łojasiewicz and center-stable manifold theory, the authors show that stable stationary points must be synchronized under a sharp inequality, enabling global convergence to the synchronized state. The framework is then applied to transformer-inspired self-attention dynamics, proving global synchronization for a broad range of the -parameter (including ) and identifying non-synchronization regimes for strongly negative under certain network sizes. Extensions include normalized dynamics and generalized weighted mean-field systems, which preserve the convergence guarantees under analogous conditions. The results provide rigorous insight into clustering and oversmoothing phenomena in simplified transformer models and related mean-field systems.

Abstract

This paper considers a mean-field model of interacting particles whose state space is the unit circle, a generalization of the classical Kuramoto model. Global synchronization is said to occur if after starting from almost any initial state, all particles coalesce to a common point on the circle. We propose a general synchronization criterion in terms of -norm of the third derivative of the particle interaction function. As an application we resolve a conjecture for the so-called self-attention dynamics (stylized model of transformers), by showing synchronization for all , which significantly extends the previous bound of from Criscitiello, Rebjock, McRae, and Boumal (2024). We also show that global synchronization does not occur when .

Paper Structure

This paper contains 12 sections, 31 theorems, 126 equations, 2 figures.

Key Result

Theorem 2.1

Consider the mean-field model eq:generalsystem2 on $\mathbb{T}$. Let $\tau\in(0,\pi]$ satisfy $f'(x) < 0$ for all $x\not\in[-\tau,\tau]$. If then every stationary point $(x_1, \ldots, x_n)$ of the system eq:generalsystem2 on $\mathbb{T}^n$ is either locally unstable or synchronized (i.e. $x_1=\cdots=x_n$).

Figures (2)

  • Figure 1: The figure plots the synchronization ratio $\left\langle|f"'|_+, \tau\right\rangle_{L_2}^{-1}4\left(1+\frac{\tau}{\pi}\right) f'(0)$ from \ref{['cor:main']} with $f(x)$ set as $\sin(x)e^{\beta(\cos(x)-1)}$ and $M$ set as $\pi$. A ratio greater than one indicates that we have determined that global synchronization occurs.
  • Figure 2: The function $f_\beta$ (red) and its derivative $f'_\beta$ (blue) for $\beta=2$. The parameter $\tau=\tau(\beta)$ is defined as the unique solution to the equation $f'_\beta(\tau)=0$ over $[0, \pi]$. For $\beta=2$, $\tau\simeq 0.6749$.

Theorems & Definitions (64)

  • Definition 1.1
  • Theorem 2.1
  • Lemma 2.2: mathpersp25
  • Theorem 2.3
  • Corollary 2.4
  • Corollary 2.5
  • Corollary 2.6
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • ...and 54 more