Table of Contents
Fetching ...

RotRNN: Modelling Long Sequences with Rotations

Kai Biegun, Rares Dolga, Jake Cunningham, David Barber

TL;DR

It is shown that RotRNN provides a simple and efficient model with a robust normalisation procedure, and a practical implementation that remains faithful to its theoretical derivation, and achieves competitive performance to state-of-the-art linear recurrent models on several long sequence modelling datasets.

Abstract

Linear recurrent neural networks, such as State Space Models (SSMs) and Linear Recurrent Units (LRUs), have recently shown state-of-the-art performance on long sequence modelling benchmarks. Despite their success, their empirical performance is not well understood and they come with a number of drawbacks, most notably their complex initialisation and normalisation schemes. In this work, we address some of these issues by proposing RotRNN -- a linear recurrent model which utilises the convenient properties of rotation matrices. We show that RotRNN provides a simple and efficient model with a robust normalisation procedure, and a practical implementation that remains faithful to its theoretical derivation. RotRNN also achieves competitive performance to state-of-the-art linear recurrent models on several long sequence modelling datasets.

RotRNN: Modelling Long Sequences with Rotations

TL;DR

It is shown that RotRNN provides a simple and efficient model with a robust normalisation procedure, and a practical implementation that remains faithful to its theoretical derivation, and achieves competitive performance to state-of-the-art linear recurrent models on several long sequence modelling datasets.

Abstract

Linear recurrent neural networks, such as State Space Models (SSMs) and Linear Recurrent Units (LRUs), have recently shown state-of-the-art performance on long sequence modelling benchmarks. Despite their success, their empirical performance is not well understood and they come with a number of drawbacks, most notably their complex initialisation and normalisation schemes. In this work, we address some of these issues by proposing RotRNN -- a linear recurrent model which utilises the convenient properties of rotation matrices. We show that RotRNN provides a simple and efficient model with a robust normalisation procedure, and a practical implementation that remains faithful to its theoretical derivation. RotRNN also achieves competitive performance to state-of-the-art linear recurrent models on several long sequence modelling datasets.
Paper Structure (31 sections, 5 theorems, 21 equations, 3 figures, 3 tables)

This paper contains 31 sections, 5 theorems, 21 equations, 3 figures, 3 tables.

Key Result

Lemma 0

Let $M\in\mathbb{R}^{N\times N}$, let $S = M - M^\top$, and define $\exp(S) := \sum_{k=0}^\infty \frac{1}{k!} S^k$ as the matrix exponential. Then $A = \exp{(S)} \in SO(N)$.

Figures (3)

  • Figure 1: Full neural network architecture of the RotRNN. Here $T$ denotes the length of the input sequence, and ${\mathcal{D}_u}$ denotes the number of channels in the input data.
  • Figure 2: A visualisation of the mulit-headed RotRNN layer outlined in Section \ref{['sec:mulit-head']}. The ${\mathcal{D}_u}$-dimensional input sequence $u_t$, $t=1,\dots,T$, is projected onto each of the $H$ heads of dimension ${\mathcal{D}_h}$ by the $B^{(h)}$ matrices. Each head then independently performs a linear recurrence with different rotations and decay scales. The outputs of each head are concatenated and mixed linearly to form the final ${\mathcal{D}_u}$-dimensional output $y_t$.
  • Figure 3: Average hidden state norm across training on ListOps for the LRU orvieto2023resurrecting and RotRNN. The standard deviation of the means is plotted in the error bars. We note that the error bars for RotRNN are present, but are mostly too small to be visible.

Theorems & Definitions (10)

  • Lemma 0
  • proof
  • Lemma 0
  • proof
  • Lemma 0
  • proof
  • Lemma 1
  • proof
  • Lemma 1
  • proof