RotRNN: Modelling Long Sequences with Rotations

Kai Biegun; Rares Dolga; Jake Cunningham; David Barber

RotRNN: Modelling Long Sequences with Rotations

Kai Biegun, Rares Dolga, Jake Cunningham, David Barber

TL;DR

It is shown that RotRNN provides a simple and efficient model with a robust normalisation procedure, and a practical implementation that remains faithful to its theoretical derivation, and achieves competitive performance to state-of-the-art linear recurrent models on several long sequence modelling datasets.

Abstract

Linear recurrent neural networks, such as State Space Models (SSMs) and Linear Recurrent Units (LRUs), have recently shown state-of-the-art performance on long sequence modelling benchmarks. Despite their success, their empirical performance is not well understood and they come with a number of drawbacks, most notably their complex initialisation and normalisation schemes. In this work, we address some of these issues by proposing RotRNN -- a linear recurrent model which utilises the convenient properties of rotation matrices. We show that RotRNN provides a simple and efficient model with a robust normalisation procedure, and a practical implementation that remains faithful to its theoretical derivation. RotRNN also achieves competitive performance to state-of-the-art linear recurrent models on several long sequence modelling datasets.

RotRNN: Modelling Long Sequences with Rotations

TL;DR

Abstract

Paper Structure (31 sections, 5 theorems, 21 equations, 3 figures, 3 tables)

This paper contains 31 sections, 5 theorems, 21 equations, 3 figures, 3 tables.

Introduction
Background
State Space Models
Linear Recurrent Units
Drawbacks of Prior Works
RotRNN
Parameterising Rotation Matrices
Efficient Rotational Recurrence
Normalisation
Multi-Head Decay
Analysis of RotRNN and Prior Work
Linear Recurrent Units
Multihead RotRNN as a special case of the LRU
Normalisation differences
State Space Models
...and 16 more sections

Key Result

Lemma 0

Let $M\in\mathbb{R}^{N\times N}$, let $S = M - M^\top$, and define $\exp(S) := \sum_{k=0}^\infty \frac{1}{k!} S^k$ as the matrix exponential. Then $A = \exp{(S)} \in SO(N)$.

Figures (3)

Figure 1: Full neural network architecture of the RotRNN. Here $T$ denotes the length of the input sequence, and ${\mathcal{D}_u}$ denotes the number of channels in the input data.
Figure 2: A visualisation of the mulit-headed RotRNN layer outlined in Section \ref{['sec:mulit-head']}. The ${\mathcal{D}_u}$-dimensional input sequence $u_t$, $t=1,\dots,T$, is projected onto each of the $H$ heads of dimension ${\mathcal{D}_h}$ by the $B^{(h)}$ matrices. Each head then independently performs a linear recurrence with different rotations and decay scales. The outputs of each head are concatenated and mixed linearly to form the final ${\mathcal{D}_u}$-dimensional output $y_t$.
Figure 3: Average hidden state norm across training on ListOps for the LRU orvieto2023resurrecting and RotRNN. The standard deviation of the means is plotted in the error bars. We note that the error bars for RotRNN are present, but are mostly too small to be visible.

Theorems & Definitions (10)

Lemma 0
proof
Lemma 0
proof
Lemma 0
proof
Lemma 1
proof
Lemma 1
proof

RotRNN: Modelling Long Sequences with Rotations

TL;DR

Abstract

RotRNN: Modelling Long Sequences with Rotations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (10)