Table of Contents
Fetching ...

The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle

Dibakar Sigdel

Abstract

Transformer models have redefined sequence learning, yet dot-product self-attention introduces a quadratic token-mixing bottleneck for long-context time-series. We introduce the \textbf{Phasor Transformer} block, a phase-native alternative representing sequence states on the unit-circle manifold $S^1$. Each block combines lightweight trainable phase-shifts with parameter-free Discrete Fourier Transform (DFT) token coupling, achieving global $\mathcal{O}(N\log N)$ mixing without explicit attention maps. Stacking these blocks defines the \textbf{Large Phasor Model (LPM)}. We validate LPM on autoregressive time-series prediction over synthetic multi-frequency benchmarks. Operating with a highly compact parameter budget, LPM learns stable global dynamics and achieves competitive forecasting behavior compared to conventional self-attention baselines. Our results establish an explicit efficiency-performance frontier, demonstrating that large-model scaling for time-series can emerge from geometry-constrained phase computation with deterministic global coupling, offering a practical path toward scalable temporal modeling in oscillatory domains.

The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle

Abstract

Transformer models have redefined sequence learning, yet dot-product self-attention introduces a quadratic token-mixing bottleneck for long-context time-series. We introduce the \textbf{Phasor Transformer} block, a phase-native alternative representing sequence states on the unit-circle manifold . Each block combines lightweight trainable phase-shifts with parameter-free Discrete Fourier Transform (DFT) token coupling, achieving global mixing without explicit attention maps. Stacking these blocks defines the \textbf{Large Phasor Model (LPM)}. We validate LPM on autoregressive time-series prediction over synthetic multi-frequency benchmarks. Operating with a highly compact parameter budget, LPM learns stable global dynamics and achieves competitive forecasting behavior compared to conventional self-attention baselines. Our results establish an explicit efficiency-performance frontier, demonstrating that large-model scaling for time-series can emerge from geometry-constrained phase computation with deterministic global coupling, offering a practical path toward scalable temporal modeling in oscillatory domains.
Paper Structure (25 sections, 4 theorems, 20 equations, 5 figures, 3 tables)

This paper contains 25 sections, 4 theorems, 20 equations, 5 figures, 3 tables.

Key Result

Proposition 2.1

Let $\boldsymbol{z}\in\mathbb{T}^N$ and $F_T\in U(T)$. Then while in general $F_T\boldsymbol{z}\notin\mathbb{T}^N$ because coordinatewise constraints $|(F_T\boldsymbol{z})_k|=1$ need not hold.

Figures (5)

  • Figure 1: Single-block Phasor Transformer used in LPM. Global token interaction is induced by deterministic DFT interference ($F_T$), while learnable pre/post shift layers provide lightweight phase adaptation.
  • Figure 2: Multi-stack LPM transformer schematic. Each block applies pre-shift, DFT token mixing, and post-shift operations, followed by pull-back normalization before the next block.
  • Figure 3: Phasor Transformer performance on sequence benchmarking, detailing the learning convergence and interpolation prediction capabilities.
  • Figure 4: Empirical evaluation comparing the predictive capability (MAE) and training capacity of an $S^1$ Phasor network relative to a deep Euclidean parameter space.
  • Figure 5: Generative Autoregressive rollout displaying independent extended interpolation capability following deep $D=3$ optimization.

Theorems & Definitions (7)

  • Definition 2.1: Phasor Token State Manifold
  • Definition 2.2: Ambient Interference Space
  • Proposition 2.1: Spectral Mixing Preserves Energy, Not Coordinatewise Modulus
  • Definition 2.3: Phasor Transformer Block
  • Theorem 2.1: Linear-Parameter Global Mixing in LPM
  • Proposition 2.2: Pull-Back Boundedness for Inter-Block States
  • Corollary 2.2: Parameter-Efficiency Regime of LPM