Do traveling waves make good positional encodings?
Chase van de Geijn, Ayush Paliwal, Timo Lüddecke, Alexander S. Ecker
TL;DR
RollPE proposes a traveling-wave–based positional encoding for transformers by circularly rolling queries and keys, turning attention into a function of relative position. The method yields a relative, topography-friendly encoding that outperforms absolute encodings and matches RoPE, with a continuous generalization via Lie algebra and a spectral equivalence to RoPE. The authors connect RollPE to topographic regularization and neuroscience–inspired models, arguing that traveling-wave dynamics underlie effective positional encoding and offering a lens to view RoPE through this perspective. These insights suggest a simpler, wave-based interpretation of RoPE and a bridge between brain-inspired dynamics and transformer attention.
Abstract
Transformers rely on positional encoding to compensate for the inherent permutation invariance of self-attention. Traditional approaches use absolute sinusoidal embeddings or learned positional vectors, while more recent methods emphasize relative encodings to better capture translation equivariances. In this work, we propose RollPE, a novel positional encoding mechanism based on traveling waves, implemented by applying a circular roll operation to the query and key tensors in self-attention. This operation induces a relative shift in phase across positions, allowing the model to compute attention as a function of positional differences rather than absolute indices. We show this simple method significantly outperforms traditional absolute positional embeddings and is comparable to RoPE. We derive a continuous case of RollPE which implicitly imposes a topographic structure on the query and key space. We further derive a mathematical equivalence of RollPE to a particular configuration of RoPE. Viewing RollPE through the lens of traveling waves may allow us to simplify RoPE and relate it to processes of information flow in the brain.
