HiPPO: Recurrent Memory with Optimal Polynomial Projections

Albert Gu; Tri Dao; Stefano Ermon; Atri Rudra; Christopher Re

HiPPO: Recurrent Memory with Optimal Polynomial Projections

Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher Re

TL;DR

HiPPO reframes memory in sequential data as online function approximation, projecting the history onto a polynomial basis with a time-varying measure. The framework derives tractable ODE or recurrence updates for the projection coefficients, unifying the LegT, LagT, and LegS memory schemes and recovering the LMU in a principled way. The novel LegS variant uses a scaled Legendre measure to achieve timescale robustness without hyperparameters, with strong gradient behavior and theoretical error bounds, and it yields state-of-the-art results on permuted MNIST while showing resilience to timescale shifts and missing data. Empirically, HiPPO-integrated RNNs demonstrate superior long-range memory, fast online updates, and scalability to millions of steps, suggesting broad applicability to long-horizon sequence modeling and time-series tasks.

Abstract

A central problem in learning from sequential data is representing cumulative history in an incremental fashion as more data is processed. We introduce a general framework (HiPPO) for the online compression of continuous signals and discrete time series by projection onto polynomial bases. Given a measure that specifies the importance of each time step in the past, HiPPO produces an optimal solution to a natural online function approximation problem. As special cases, our framework yields a short derivation of the recent Legendre Memory Unit (LMU) from first principles, and generalizes the ubiquitous gating mechanism of recurrent neural networks such as GRUs. This formal framework yields a new memory update mechanism (HiPPO-LegS) that scales through time to remember all history, avoiding priors on the timescale. HiPPO-LegS enjoys the theoretical benefits of timescale robustness, fast updates, and bounded gradients. By incorporating the memory dynamics into recurrent neural networks, HiPPO RNNs can empirically capture complex temporal dependencies. On the benchmark permuted MNIST dataset, HiPPO-LegS sets a new state-of-the-art accuracy of 98.3%. Finally, on a novel trajectory classification task testing robustness to out-of-distribution timescales and missing data, HiPPO-LegS outperforms RNN and neural ODE baselines by 25-40% accuracy.

HiPPO: Recurrent Memory with Optimal Polynomial Projections

TL;DR

Abstract

HiPPO: Recurrent Memory with Optimal Polynomial Projections

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (16)