Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers

Feilong Liu

Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers

Feilong Liu

TL;DR

This work offers a first-principles, signal-processing view of Rotary Positional Embeddings (RoPE) by treating them as phase modulation over a bank of complex oscillators. It derives principled, depth- and precision-aware bounds: a fundamental aliasing limit $L < 2\pi \cdot \text{base}$ and a DC-component stability bound $\text{base} \ge L/\arccos(\epsilon)$ that tighten with depth as $\text{base} \ge L/\arccos(\epsilon^{1/N})$, together defining a Goldilocks zone for long-context transformers. It also identifies a precision-based upper bound $\text{base} < 1/\epsilon_{mach}$, revealing a hard numerical ceiling (the Precision Wall) near ultra-long contexts. Case studies on LLaMA, Mistral, and DeepSeek show observed successes and failures align with these bounds, explaining abrupt attention collapse when bases are insufficient and the need for spectrum shaping or higher precision for very long contexts. The findings offer principled guidance for RoPE base selection and retrofits, linking architectural choices to concrete spectral and hardware constraints and setting the stage for adaptive or higher-precision RoPE strategies in ultra-long-context models.

Abstract

Rotary positional embeddings (RoPE) are widely used in large language models to encode token positions through multiplicative rotations, yet their behavior at long context lengths remains poorly characterized. In this work, we reinterpret RoPE as phase modulation applied to a bank of complex oscillators, enabling analysis through classical signal processing theory. Under this formulation, we derive principled lower bounds on the RoPE base parameter that are necessary to preserve positional coherence over a target context length. These include a fundamental aliasing bound, analogous to a Nyquist limit, and a DC-component stability bound that constrains phase drift in low-frequency positional modes. We further extend this analysis to deep transformers, showing that repeated rotary modulation across layers compounds angular misalignment, tightening the base requirement as depth increases. Complementing these results, we derive a precision-dependent upper bound on the RoPE base arising from finite floating-point resolution. Beyond this limit, incremental phase updates become numerically indistinguishable, leading to positional erasure even in the absence of aliasing. Together, the lower and upper bounds define a precision- and depth-dependent feasibility region a Goldilocks zone for long-context transformers. We validate the framework through a comprehensive case study of state-of-the-art models, including LLaMA, Mistral, and DeepSeek variants, showing that observed successes, failures, and community retrofits align closely with the predicted bounds. Notably, models that violate the stability bound exhibit attention collapse and long-range degradation, while attempts to scale beyond one million tokens encounter a hard precision wall independent of architecture or training.

Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers

TL;DR

and a DC-component stability bound

that tighten with depth as

, together defining a Goldilocks zone for long-context transformers. It also identifies a precision-based upper bound

, revealing a hard numerical ceiling (the Precision Wall) near ultra-long contexts. Case studies on LLaMA, Mistral, and DeepSeek show observed successes and failures align with these bounds, explaining abrupt attention collapse when bases are insufficient and the need for spectrum shaping or higher precision for very long contexts. The findings offer principled guidance for RoPE base selection and retrofits, linking architectural choices to concrete spectral and hardware constraints and setting the stage for adaptive or higher-precision RoPE strategies in ultra-long-context models.

Abstract

Paper Structure (41 sections, 1 theorem, 48 equations, 3 figures, 1 table)

This paper contains 41 sections, 1 theorem, 48 equations, 3 figures, 1 table.

Introduction
Contributions
Relation to prior work
Theoretical background
Rotary Positional Embeddings
Complex-Valued Representation
RoPE as Phase Modulation of Oscillator Banks
Depth and Repeated Rotary Modulation
Numerical Precision Considerations
Summary and Setup for Analysis
Theoretical bound for RoPE base
Preliminaries and Notation
Theoretical lower bound for RoPE base
The Fundamental Aliasing Limit
Single layer DC-component stability Limit
...and 26 more sections

Key Result

Lemma B.1

If there exists a positive integer $T \le L$ such that for some channel $k$, then for every position $p$ we have for that channel. Consequently, the positional encoding is not injective on $[0, L]$; it repeats with period $T$.

Figures (3)

Figure 1: Fundamental Aliasing in RoPE Embeddings (Base=10,000).
Figure 2: Effect of RoPE Base on DC component Component Stability.
Figure 3: Effect of RoPE Base on DC component Component Stability.

Theorems & Definitions (2)

Lemma B.1: Periodicity leads to indistinguishability
proof

Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers

TL;DR

Abstract

Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (2)