3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding
Xindian Ma, Wenyuan Liu, Peng Zhang, Nan Xu
TL;DR
This work addresses RoPE's limitations in long-context modeling by introducing 3D Rotary Position Encoding (3D-RPE) on a Bloch Sphere. By partitioning sequences into chunks and applying two rotation angles—within-chunk and between-chunk—3D-RPE achieves controllable long-term decay and enhanced positional resolution, even under linear interpolation. Theoretical results (including bounds and Theorem 1) and training-free experiments demonstrate superior long-context NLU and promising long-sequence LM performance, with robust improvements when extending context windows up to 100k tokens. This approach has potential to significantly improve long-context understanding and generation in LLMs and may extend to multimodal settings due to its 3D geometric framing.
Abstract
Inspired by the Bloch Sphere representation, we propose a novel rotary position encoding on a three-dimensional sphere, named 3D Rotary Position Encoding (3D-RPE). 3D-RPE is an advanced version of the widely used 2D Rotary Position Encoding (RoPE), with two major advantages for modeling long contexts: controllable long-term decay and improved position resolution. For controllable long-term decay, 3D-RPE allows for the regulation of long-term decay within the chunk size, ensuring the modeling of relative positional information between tokens at a distant relative position. For enhanced position resolution, 3D-RPE can mitigate the degradation of position resolution caused by position interpolation on RoPE. We have conducted experiments on long-context Natural Language Understanding (NLU) and long-sequence Language Modeling (LM) tasks. From the experimental results, 3D-RPE achieved performance improvements over RoPE, especially in long-context NLU tasks.
