Table of Contents
Fetching ...

HiPPO Zoo: Explicit Memory Mechanisms for Interpretable State Space Models

Jack Goffinet, Casey Hanks, David E. Carlson

TL;DR

This work revisits the HiPPO framework and shows how polynomial representations of history can be extended to support capabilities of modern SSMs such as adaptive allocation of memory and associative memory while retaining direct interpretability in the OP basis.

Abstract

Representing the past in a compressed, efficient, and informative manner is a central problem for systems trained on sequential data. The HiPPO framework, originally proposed by Gu & Dao et al., provides a principled approach to sequential compression by projecting signals onto orthogonal polynomial (OP) bases via structured linear ordinary differential equations. Subsequent works have embedded these dynamics in state space models (SSMs), where HiPPO structure serves as an initialization. Nonlinear successors of these SSM methods such as Mamba are state-of-the-art for many tasks with long-range dependencies, but the mechanisms by which they represent and prioritize history remain largely implicit. In this work, we revisit the HiPPO framework with the goal of making these mechanisms explicit. We show how polynomial representations of history can be extended to support capabilities of modern SSMs such as adaptive allocation of memory and associative memory while retaining direct interpretability in the OP basis. We introduce a unified framework comprising five such extensions, which we collectively refer to as a "HiPPO zoo." Each extension exposes a specific modeling capability through an explicit, interpretable modification of the HiPPO framework. The resulting models adapt their memory online and train in streaming settings with efficient updates. We illustrate the behaviors and modeling advantages of these extensions through a range of synthetic sequence modeling tasks, demonstrating that capabilities typically associated with modern SSMs can be realized through explicit, interpretable polynomial memory structures.

HiPPO Zoo: Explicit Memory Mechanisms for Interpretable State Space Models

TL;DR

This work revisits the HiPPO framework and shows how polynomial representations of history can be extended to support capabilities of modern SSMs such as adaptive allocation of memory and associative memory while retaining direct interpretability in the OP basis.

Abstract

Representing the past in a compressed, efficient, and informative manner is a central problem for systems trained on sequential data. The HiPPO framework, originally proposed by Gu & Dao et al., provides a principled approach to sequential compression by projecting signals onto orthogonal polynomial (OP) bases via structured linear ordinary differential equations. Subsequent works have embedded these dynamics in state space models (SSMs), where HiPPO structure serves as an initialization. Nonlinear successors of these SSM methods such as Mamba are state-of-the-art for many tasks with long-range dependencies, but the mechanisms by which they represent and prioritize history remain largely implicit. In this work, we revisit the HiPPO framework with the goal of making these mechanisms explicit. We show how polynomial representations of history can be extended to support capabilities of modern SSMs such as adaptive allocation of memory and associative memory while retaining direct interpretability in the OP basis. We introduce a unified framework comprising five such extensions, which we collectively refer to as a "HiPPO zoo." Each extension exposes a specific modeling capability through an explicit, interpretable modification of the HiPPO framework. The resulting models adapt their memory online and train in streaming settings with efficient updates. We illustrate the behaviors and modeling advantages of these extensions through a range of synthetic sequence modeling tasks, demonstrating that capabilities typically associated with modern SSMs can be realized through explicit, interpretable polynomial memory structures.
Paper Structure (72 sections, 77 equations, 6 figures, 1 table)

This paper contains 72 sections, 77 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: A: A visualization of the time warping interpretation of Salience HiPPO. Salience HiPPO is equivalent to a standard HiPPO system under an invertible time warp $\varphi$. Consequently, the static measures in warped time (left; green, red, and yellow) correspond to dynamic measures in real time (bottom; green, red, and yellow). B: Online learning of a continuous time invariant system. The ground truth system (middle) has a nonzero second order Volterra kernel. Online learning of a linear HiPPO system is insufficient to learn the kernel. Using a second order (quadratic) Volterra HiPPO system, the correct kernel can be inferred (right), with faster convergence than an MLP.
  • Figure 2: Salience HiPPO applied to a selective copying task. Top: HiPPO is tasked with remembering only informative tokens (shown here projected as colors). White points are uninformative. When given gray input tokens HiPPO is tasked with repeating each informative token in order. Second from the top: Salience HiPPO measure functions place importance on informative points, and not on uninformative points. Second from bottom: The state-dependent salience signal $g(t)$, showing consistently decreased salience at times when the input signal is uninformative. Bottom: Linear functionals used to predict output points show strong peaks at the corresponding informative timepoint, showing strong performance on the task.
  • Figure 3: Associative Memory HiPPO uses OP associative memory to solve an associative recall task. Top: Task schematic Middle: The system learns to read and write from consistent locations. Bottom: OP memory before and after a write operation.
  • Figure 4: Multiscale HiPPO provides stable and parsimonious explicit representations over a continuum of timescales. Top: a single Multiscale HiPPO system (red) remembers the past at many timescales, while vanilla HiPPO systems (blue) remember single timescales. Bottom: When tasked with reproducing 16 Legendre polynomial coefficients to summarize the past at a range of timescales, Multiscale HiPPO consistently outputs coefficients of the same or better quality than single-timescale HiPPO systems (with timescales $10^1,\dots, 10^4$, light to dark blue).
  • Figure 5: A: Forecasting HiPPO schematic B: Forecasting HiPPO with a reduced rank linear forecasting map reveals different "predictive memories" under different forecasting horizons. C: Objective-dependent history geometries in the top eigenfunctions of $Q$, the predictive history metric, show broader features for the long-horizon objective.
  • ...and 1 more figures