Table of Contents
Fetching ...

Spline-based Transformers

Prashanth Chandran, Agon Serifi, Markus Gross, Moritz Bächer

TL;DR

Spline-based Transformers replace traditional absolute positional encoding by embedding input sequences as a continuous latent spline trajectory defined by learnable control points in $\mathbb{R}^d$. The encoder concatenates control tokens to produce latent spline controls, which are evaluated to form a trajectory $\mathbf{s}(t)$ that the decoder uses to reconstruct the sequence, with no need for explicit positional encoding. Across synthetic curves, images, motion, and hair-geometry tasks, the method achieves superior reconstruction quality and enables interactive latent-space edits through control-point manipulation, outperforming baselines such as ALiBi and ALiBi-Cat. The approach is simple to implement, offers controllable latent spaces, and has potential for broad applicability, though it exhibits sensitivity to learning-rate scheduling that warrants further stabilization work.

Abstract

We introduce Spline-based Transformers, a novel class of Transformer models that eliminate the need for positional encoding. Inspired by workflows using splines in computer animation, our Spline-based Transformers embed an input sequence of elements as a smooth trajectory in latent space. Overcoming drawbacks of positional encoding such as sequence length extrapolation, Spline-based Transformers also provide a novel way for users to interact with transformer latent spaces by directly manipulating the latent control points to create new latent trajectories and sequences. We demonstrate the superior performance of our approach in comparison to conventional positional encoding on a variety of datasets, ranging from synthetic 2D to large-scale real-world datasets of images, 3D shapes, and animations.

Spline-based Transformers

TL;DR

Spline-based Transformers replace traditional absolute positional encoding by embedding input sequences as a continuous latent spline trajectory defined by learnable control points in . The encoder concatenates control tokens to produce latent spline controls, which are evaluated to form a trajectory that the decoder uses to reconstruct the sequence, with no need for explicit positional encoding. Across synthetic curves, images, motion, and hair-geometry tasks, the method achieves superior reconstruction quality and enables interactive latent-space edits through control-point manipulation, outperforming baselines such as ALiBi and ALiBi-Cat. The approach is simple to implement, offers controllable latent spaces, and has potential for broad applicability, though it exhibits sensitivity to learning-rate scheduling that warrants further stabilization work.

Abstract

We introduce Spline-based Transformers, a novel class of Transformer models that eliminate the need for positional encoding. Inspired by workflows using splines in computer animation, our Spline-based Transformers embed an input sequence of elements as a smooth trajectory in latent space. Overcoming drawbacks of positional encoding such as sequence length extrapolation, Spline-based Transformers also provide a novel way for users to interact with transformer latent spaces by directly manipulating the latent control points to create new latent trajectories and sequences. We demonstrate the superior performance of our approach in comparison to conventional positional encoding on a variety of datasets, ranging from synthetic 2D to large-scale real-world datasets of images, 3D shapes, and animations.

Paper Structure

This paper contains 20 sections, 2 equations, 13 figures, 6 tables, 2 algorithms.

Figures (13)

  • Figure 1: Spline-based Transformers. Our Spline-based Transformers encode an input sequence, together with learnable control tokens, into a trajectory in latent space defined by the latent control points of a spline curve.
  • Figure 1: We show Hypotrochoid curves generated by our method along with their corresponding latent control points (shown as dots) and the latent trajectories obtained using a cubic Bézier interpolator. Curves that look similar in Cartesian space (Cols 1, 2) seem to have similar latent controls and trajectories, while smoother curves (Col 3) seem to have smoother latent trajectories.
  • Figure 2: Variations of Latent Spaces. Our Spline-based Transformers use multiple control points to evaluate a latent B-Spline and to create a $d$ dimensional trajectory in the model's latent space. On the other hand, ALiBi duplicates a single control point and adds positional information to the duplicated points, while the positional information is concatenated to the duplicated control point in ALiBi-Cat.
  • Figure 2: Modifying the control points alters the latent trajectories according to the chosen B-Spline (cubic Bézier in our case). Here we incrementally modify control point $C_{2}$ to be closer to $C_{1}$ and visualize the change in the latent space. The corresponding reconstructed curve is shown in the top row.
  • Figure 3: Our Spline-based Transformer can successfully reconstruct curves of different families with consistently better performance than ALiBi and ALiBi-Cat. In certain scenarios (third row), reconstructions from ALiBi and ALiBi-Cat can collapse to a single point, while our Spline-based Transformer successfully manages to recover the input curve.
  • ...and 8 more figures