The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle

Dibakar Sigdel

The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle

Dibakar Sigdel

Abstract

Transformer models have redefined sequence learning, yet dot-product self-attention introduces a quadratic token-mixing bottleneck for long-context time-series. We introduce the \textbf{Phasor Transformer} block, a phase-native alternative representing sequence states on the unit-circle manifold $S^1$. Each block combines lightweight trainable phase-shifts with parameter-free Discrete Fourier Transform (DFT) token coupling, achieving global $\mathcal{O}(N\log N)$ mixing without explicit attention maps. Stacking these blocks defines the \textbf{Large Phasor Model (LPM)}. We validate LPM on autoregressive time-series prediction over synthetic multi-frequency benchmarks. Operating with a highly compact parameter budget, LPM learns stable global dynamics and achieves competitive forecasting behavior compared to conventional self-attention baselines. Our results establish an explicit efficiency-performance frontier, demonstrating that large-model scaling for time-series can emerge from geometry-constrained phase computation with deterministic global coupling, offering a practical path toward scalable temporal modeling in oscillatory domains.

The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle

Abstract

. Each block combines lightweight trainable phase-shifts with parameter-free Discrete Fourier Transform (DFT) token coupling, achieving global

mixing without explicit attention maps. Stacking these blocks defines the \textbf{Large Phasor Model (LPM)}. We validate LPM on autoregressive time-series prediction over synthetic multi-frequency benchmarks. Operating with a highly compact parameter budget, LPM learns stable global dynamics and achieves competitive forecasting behavior compared to conventional self-attention baselines. Our results establish an explicit efficiency-performance frontier, demonstrating that large-model scaling for time-series can emerge from geometry-constrained phase computation with deterministic global coupling, offering a practical path toward scalable temporal modeling in oscillatory domains.

Paper Structure (25 sections, 4 theorems, 20 equations, 5 figures, 3 tables)

This paper contains 25 sections, 4 theorems, 20 equations, 5 figures, 3 tables.

Introduction
Theory
Dense Euclidean Baseline and Motivation
Phasor Token States on T^N
Unitary Gate Primitives
Single-Block Phasor Transformer Operator
From Single-Stack to Multi-Stack LPM
Method
Dataset and Experimental Splits
Data Encoding
Variational Phasor Transformer Layer
Deterministic Readout
Optimization Protocol
Inference, Rollout, and Metrics
Results
...and 10 more sections

Key Result

Proposition 2.1

Let $\boldsymbol{z}\in\mathbb{T}^N$ and $F_T\in U(T)$. Then while in general $F_T\boldsymbol{z}\notin\mathbb{T}^N$ because coordinatewise constraints $|(F_T\boldsymbol{z})_k|=1$ need not hold.

Figures (5)

Figure 1: Single-block Phasor Transformer used in LPM. Global token interaction is induced by deterministic DFT interference ($F_T$), while learnable pre/post shift layers provide lightweight phase adaptation.
Figure 2: Multi-stack LPM transformer schematic. Each block applies pre-shift, DFT token mixing, and post-shift operations, followed by pull-back normalization before the next block.
Figure 3: Phasor Transformer performance on sequence benchmarking, detailing the learning convergence and interpolation prediction capabilities.
Figure 4: Empirical evaluation comparing the predictive capability (MAE) and training capacity of an $S^1$ Phasor network relative to a deep Euclidean parameter space.
Figure 5: Generative Autoregressive rollout displaying independent extended interpolation capability following deep $D=3$ optimization.

Theorems & Definitions (7)

Definition 2.1: Phasor Token State Manifold
Definition 2.2: Ambient Interference Space
Proposition 2.1: Spectral Mixing Preserves Energy, Not Coordinatewise Modulus
Definition 2.3: Phasor Transformer Block
Theorem 2.1: Linear-Parameter Global Mixing in LPM
Proposition 2.2: Pull-Back Boundedness for Inter-Block States
Corollary 2.2: Parameter-Efficiency Regime of LPM

The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle

Abstract

The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (7)