Table of Contents
Fetching ...

The Expressive Limits of Diagonal SSMs for State-Tracking

Mehran Shakerinava, Behnoush Khavari, Siamak Ravanbakhsh, Sarath Chandar

TL;DR

It is shown that single-layer DCD SSMs cannot express state-tracking of any non-Abelian group at finite precision, and the precise expressivity range of input-Dependent Complex-valued Diagonal SSMs within the solvable groups is identified.

Abstract

State-Space Models (SSMs) have recently been shown to achieve strong empirical performance on a variety of long-range sequence modeling tasks while remaining efficient and highly-parallelizable. However, the theoretical understanding of their expressive power remains limited. In this work, we study the expressivity of input-Dependent Complex-valued Diagonal (DCD) SSMs on sequential state-tracking tasks. We show that single-layer DCD SSMs cannot express state-tracking of any non-Abelian group at finite precision. More generally, we show that $k$-layer DCD SSMs can express state-tracking of a group if and only if that group has a subnormal series of length $k$, with Abelian factors. That is, we identify the precise expressivity range of $k$-layer DCD SSMs within the solvable groups. Empirically, we find that multi-layer models often fail to learn state-tracking for non-Abelian groups, highlighting a gap between expressivity and learnability.

The Expressive Limits of Diagonal SSMs for State-Tracking

TL;DR

It is shown that single-layer DCD SSMs cannot express state-tracking of any non-Abelian group at finite precision, and the precise expressivity range of input-Dependent Complex-valued Diagonal SSMs within the solvable groups is identified.

Abstract

State-Space Models (SSMs) have recently been shown to achieve strong empirical performance on a variety of long-range sequence modeling tasks while remaining efficient and highly-parallelizable. However, the theoretical understanding of their expressive power remains limited. In this work, we study the expressivity of input-Dependent Complex-valued Diagonal (DCD) SSMs on sequential state-tracking tasks. We show that single-layer DCD SSMs cannot express state-tracking of any non-Abelian group at finite precision. More generally, we show that -layer DCD SSMs can express state-tracking of a group if and only if that group has a subnormal series of length , with Abelian factors. That is, we identify the precise expressivity range of -layer DCD SSMs within the solvable groups. Empirically, we find that multi-layer models often fail to learn state-tracking for non-Abelian groups, highlighting a gap between expressivity and learnability.
Paper Structure (40 sections, 6 theorems, 14 equations, 5 figures, 4 tables)

This paper contains 40 sections, 6 theorems, 14 equations, 5 figures, 4 tables.

Key Result

theorem 1

There is a single-layer DCD SSM that tracks $G$ at finite precision iff $G$ is Abelian.

Figures (5)

  • Figure 1: A group $G_k$ can be tracked by a $k$nobreaklayer DCD SSM iff the group has a subnormal chain of length $k$ with Abelian factors. Each layer tracks one such Abelian factor as shown in the diagram.
  • Figure 2: Compound automaton combining (left) two-state parity and (right) three-state cyclic group. The dashed curved line represents the connection between the two cyclic automata, equivalent to the semi-direct product of their groups. It shows that the automaton on the right considers the state of the first automaton, besides the original input. Commas separate inputs that produce the same state transition. States labelled start are the initial states of the automata.
  • Figure 3: We distinguish 4 cases for $\lambda$: (teal) $\| \lambda \| < 1$, (purple) $\| \lambda \| = 1, \lambda \neq 1$, (pink) $\lambda = 1$, (gray) $\| \lambda \| > 1$.
  • Figure 4: A simple example where the first-layer state is in $\mathbb{C}^1$, the group $G$ is partitioned into 3 cosets, and each equivalence class has size 2.
  • Figure 5: (a) Model is in state $g'$ and receives input $g$. (b) First layer updates from $h'$ to $h'h$ and outputs $\kappa(h', g)$. (c) Higher layers update from $n'$ to $n'\kappa(h', g)$. The final collective state is $n'\kappa(h',g)s(h'h) = g'g$ as desired.

Theorems & Definitions (13)

  • definition 1: SSM Layer
  • theorem 1
  • lemma 1
  • lemma 2
  • lemma 3
  • theorem 2
  • lemma 4
  • proof
  • proof
  • proof
  • ...and 3 more