Table of Contents
Fetching ...

Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

Joshua Nunley

TL;DR

A minimal axiomatic setup is used and recurrent and transformer templates from a shared skeleton in which subgroup choice acts as a drop-in replacement for state space, tangent projection, and update map and a general linear-mixing extension in tangent space is reported.

Abstract

This paper presents a direct framework for sequence models with hidden states on closed subgroups of U(d). We use a minimal axiomatic setup and derive recurrent and transformer templates from a shared skeleton in which subgroup choice acts as a drop-in replacement for state space, tangent projection, and update map. We then specialize to O(d) and evaluate orthogonal-state RNN and transformer models on Tiny Shakespeare and Penn Treebank under parameter-matched settings. We also report a general linear-mixing extension in tangent space, which applies across subgroup choices and improves finite-budget performance in the current O(d) experiments.

Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

TL;DR

A minimal axiomatic setup is used and recurrent and transformer templates from a shared skeleton in which subgroup choice acts as a drop-in replacement for state space, tangent projection, and update map and a general linear-mixing extension in tangent space is reported.

Abstract

This paper presents a direct framework for sequence models with hidden states on closed subgroups of U(d). We use a minimal axiomatic setup and derive recurrent and transformer templates from a shared skeleton in which subgroup choice acts as a drop-in replacement for state space, tangent projection, and update map. We then specialize to O(d) and evaluate orthogonal-state RNN and transformer models on Tiny Shakespeare and Penn Treebank under parameter-matched settings. We also report a general linear-mixing extension in tangent space, which applies across subgroup choices and improves finite-budget performance in the current O(d) experiments.
Paper Structure (31 sections, 29 equations, 3 figures, 8 tables)

This paper contains 31 sections, 29 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Tiny Shakespeare scaling curve corresponding to Table \ref{['tab:ts_scaling_summary']}.
  • Figure 2: Penn Treebank scaling curve corresponding to Table \ref{['tab:ptb_scaling_summary']}.
  • Figure 3: Optimizer robustness summary on Tiny Shakespeare at 500K parameters (2-layer and 4-layer settings).