Table of Contents
Fetching ...

3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding

Xindian Ma, Wenyuan Liu, Peng Zhang, Nan Xu

TL;DR

This work addresses RoPE's limitations in long-context modeling by introducing 3D Rotary Position Encoding (3D-RPE) on a Bloch Sphere. By partitioning sequences into chunks and applying two rotation angles—within-chunk and between-chunk—3D-RPE achieves controllable long-term decay and enhanced positional resolution, even under linear interpolation. Theoretical results (including bounds and Theorem 1) and training-free experiments demonstrate superior long-context NLU and promising long-sequence LM performance, with robust improvements when extending context windows up to 100k tokens. This approach has potential to significantly improve long-context understanding and generation in LLMs and may extend to multimodal settings due to its 3D geometric framing.

Abstract

Inspired by the Bloch Sphere representation, we propose a novel rotary position encoding on a three-dimensional sphere, named 3D Rotary Position Encoding (3D-RPE). 3D-RPE is an advanced version of the widely used 2D Rotary Position Encoding (RoPE), with two major advantages for modeling long contexts: controllable long-term decay and improved position resolution. For controllable long-term decay, 3D-RPE allows for the regulation of long-term decay within the chunk size, ensuring the modeling of relative positional information between tokens at a distant relative position. For enhanced position resolution, 3D-RPE can mitigate the degradation of position resolution caused by position interpolation on RoPE. We have conducted experiments on long-context Natural Language Understanding (NLU) and long-sequence Language Modeling (LM) tasks. From the experimental results, 3D-RPE achieved performance improvements over RoPE, especially in long-context NLU tasks.

3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding

TL;DR

This work addresses RoPE's limitations in long-context modeling by introducing 3D Rotary Position Encoding (3D-RPE) on a Bloch Sphere. By partitioning sequences into chunks and applying two rotation angles—within-chunk and between-chunk—3D-RPE achieves controllable long-term decay and enhanced positional resolution, even under linear interpolation. Theoretical results (including bounds and Theorem 1) and training-free experiments demonstrate superior long-context NLU and promising long-sequence LM performance, with robust improvements when extending context windows up to 100k tokens. This approach has potential to significantly improve long-context understanding and generation in LLMs and may extend to multimodal settings due to its 3D geometric framing.

Abstract

Inspired by the Bloch Sphere representation, we propose a novel rotary position encoding on a three-dimensional sphere, named 3D Rotary Position Encoding (3D-RPE). 3D-RPE is an advanced version of the widely used 2D Rotary Position Encoding (RoPE), with two major advantages for modeling long contexts: controllable long-term decay and improved position resolution. For controllable long-term decay, 3D-RPE allows for the regulation of long-term decay within the chunk size, ensuring the modeling of relative positional information between tokens at a distant relative position. For enhanced position resolution, 3D-RPE can mitigate the degradation of position resolution caused by position interpolation on RoPE. We have conducted experiments on long-context Natural Language Understanding (NLU) and long-sequence Language Modeling (LM) tasks. From the experimental results, 3D-RPE achieved performance improvements over RoPE, especially in long-context NLU tasks.
Paper Structure (24 sections, 2 theorems, 25 equations, 4 figures, 5 tables)

This paper contains 24 sections, 2 theorems, 25 equations, 4 figures, 5 tables.

Key Result

Theorem 1

For a pre-trained language model with a length of $L_p$ and an extension length requirement of $L$, employing linear position interpolation extension methods $\mathcal{I}$ based on Rotary Position Encoding (RoPE) can elevate the relative positional resolution from $\mathcal{E}_{rope}$ to $\mathcal{E

Figures (4)

  • Figure 1: 2D Rotary Position Encoding (RoPE) vs. 3D Rotary Position Encoding (3D-RPE).
  • Figure 2: Visualization of the 3D Rotary Position Encoding (3D-RPE). The context size is $L$, and the chunk size is $c$. The vectors ${[\bm{h}_{j,m}^{1}, \bm{h}_{j,m}^{2}]}^{T}$ and ${[-\bm{h}_{j,m}^{2}, \bm{h}_{j,m}^{1}]}^{T}$ form an orthogonal basis, corresponding to the $\ket{1}$ and $\ket{0}$ states in Eq. (\ref{['bs-equ']}). The components $\bm{h}_{j,m}^{1}$ and $\bm{h}_{j,m}^{2}$ represent the first and second dimensions of the state vector $\bm{h}_{j,m}$, which is the $m_{th}$ token in the $j_{th}$ chunk.
  • Figure 3: Visualization of the Relative Position Matrix $\bm{A}$ employing 3D-RPE, with chunk size $c$=$4$, and sequence size $L$=$12$. The matrix elements $A_{i,j}$ represents the relative position between the $i_{th}$ query vector $\bm{q}$ and the $j_{th}$ key vector $\bm{k}$.
  • Figure 4: A diagram of Bloch Sphere.

Theorems & Definitions (5)

  • Definition 1: 3D Rotary Position Encoding
  • Theorem 1: Enhanced Position Resolution
  • Definition 2: Positional Interpolation Resolution
  • Theorem 2: Chunk Position Encoding Resolution Enhancement
  • proof