Spline-based Transformers
Prashanth Chandran, Agon Serifi, Markus Gross, Moritz Bächer
TL;DR
Spline-based Transformers replace traditional absolute positional encoding by embedding input sequences as a continuous latent spline trajectory defined by learnable control points in $\mathbb{R}^d$. The encoder concatenates control tokens to produce latent spline controls, which are evaluated to form a trajectory $\mathbf{s}(t)$ that the decoder uses to reconstruct the sequence, with no need for explicit positional encoding. Across synthetic curves, images, motion, and hair-geometry tasks, the method achieves superior reconstruction quality and enables interactive latent-space edits through control-point manipulation, outperforming baselines such as ALiBi and ALiBi-Cat. The approach is simple to implement, offers controllable latent spaces, and has potential for broad applicability, though it exhibits sensitivity to learning-rate scheduling that warrants further stabilization work.
Abstract
We introduce Spline-based Transformers, a novel class of Transformer models that eliminate the need for positional encoding. Inspired by workflows using splines in computer animation, our Spline-based Transformers embed an input sequence of elements as a smooth trajectory in latent space. Overcoming drawbacks of positional encoding such as sequence length extrapolation, Spline-based Transformers also provide a novel way for users to interact with transformer latent spaces by directly manipulating the latent control points to create new latent trajectories and sequences. We demonstrate the superior performance of our approach in comparison to conventional positional encoding on a variety of datasets, ranging from synthetic 2D to large-scale real-world datasets of images, 3D shapes, and animations.
