Table of Contents
Fetching ...

Synchronization on circles and spheres with nonlinear interactions

Christopher Criscitiello, Quentin Rebjock, Andrew D. McRae, Nicolas Boumal

TL;DR

The paper analyzes gradient-flow dynamics for $n$ points on the unit sphere interacting through $\varphi$ of their inner products, connecting to transformer-inspired models. It shows that for $d\ge3$ synchronization holds on connected graphs under increasing convex $\varphi$, and it introduces a new condition on the Taylor coefficients of $\varphi'$ that ensures circle synchronization ($d=2$). It also constructs a real-analytic $\varphi$ that yields spurious non-synchronizing local maxima for all $n\ge5$, illustrating fundamental obstacles on the circle, while showing benign behavior for small $\beta$ and providing large-$\beta$ criteria. Together, these results clarify the circle-sphere dichotomy and address open problems posed by Geshkovski et al. (2024) in the context of transformer-inspired gradient landscapes on manifolds.

Abstract

We consider the dynamics of $n$ points on a sphere in $\mathbb{R}^d$ ($d \geq 2$) which attract each other according to a function $\varphi$ of their inner products. When $\varphi$ is linear ($\varphi(t) = t$), the points converge to a common value (i.e., synchronize) in various connectivity scenarios: this is part of classical work on Kuramoto oscillator networks. When $\varphi$ is exponential ($\varphi(t) = e^{βt}$), these dynamics correspond to a limit of how idealized transformers process data, as described by Geshkovski et al. (2024). Accordingly, they ask whether synchronization occurs for exponential $\varphi$. In the context of consensus for multi-agent control, Markdahl et al. (2018) show that for $d \geq 3$ (spheres), if the interaction graph is connected and $\varphi$ is increasing and convex, then the system synchronizes. What is the situation on circles ($d=2$)? First, we show that $\varphi$ being increasing and convex is no longer sufficient. Then we identify a new condition (that the Taylor coefficients of $\varphi'$ are decreasing) under which we do have synchronization on the circle. In so doing, we provide some answers to the open problems posed by Geshkovski et al. (2024).

Synchronization on circles and spheres with nonlinear interactions

TL;DR

The paper analyzes gradient-flow dynamics for points on the unit sphere interacting through of their inner products, connecting to transformer-inspired models. It shows that for synchronization holds on connected graphs under increasing convex , and it introduces a new condition on the Taylor coefficients of that ensures circle synchronization (). It also constructs a real-analytic that yields spurious non-synchronizing local maxima for all , illustrating fundamental obstacles on the circle, while showing benign behavior for small and providing large- criteria. Together, these results clarify the circle-sphere dichotomy and address open problems posed by Geshkovski et al. (2024) in the context of transformer-inspired gradient landscapes on manifolds.

Abstract

We consider the dynamics of points on a sphere in () which attract each other according to a function of their inner products. When is linear (), the points converge to a common value (i.e., synchronize) in various connectivity scenarios: this is part of classical work on Kuramoto oscillator networks. When is exponential (), these dynamics correspond to a limit of how idealized transformers process data, as described by Geshkovski et al. (2024). Accordingly, they ask whether synchronization occurs for exponential . In the context of consensus for multi-agent control, Markdahl et al. (2018) show that for (spheres), if the interaction graph is connected and is increasing and convex, then the system synchronizes. What is the situation on circles ()? First, we show that being increasing and convex is no longer sufficient. Then we identify a new condition (that the Taylor coefficients of are decreasing) under which we do have synchronization on the circle. In so doing, we provide some answers to the open problems posed by Geshkovski et al. (2024).
Paper Structure (15 sections, 19 theorems, 85 equations, 1 figure)

This paper contains 15 sections, 19 theorems, 85 equations, 1 figure.

Key Result

Theorem 1

Fix $n \geq 1$. Assume Then, critical points of $f$ where the Hessian is negative semidefinite are global maxima of $f$. In particular: local maxima are global maxima, they are the synchronized states ($x_1 = \cdots = x_n$), and (if $\varphi$ is real-analytic) gradient flow converges to a synchronized state from almost ev

Figures (1)

  • Figure 1: A regular $n$-gon on the circle ($n = 13, d = 2$). In Section \ref{['sec:circles']}, we construct the function $\varphi$ by smoothing a ReLU such that $\varphi'(\cos(\theta_i - \theta_j))$ is about 1 when $\cos(\theta_i-\theta_j) > \tau$ and about 0 otherwise. Points are color-coded by $\varphi'(\cos(\theta_i - 0))$.

Theorems & Definitions (35)

  • Theorem 1: Spheres, markdahl2018nsphere
  • Corollary 2
  • Theorem 3: Circles
  • Theorem 4
  • Corollary 5: Small $\beta$
  • Theorem 6
  • Corollary 7: large $\beta$, geshkovski2024transformers
  • Remark 8
  • Lemma 9
  • Lemma 10
  • ...and 25 more