A Brief Survey on the Approximation Theory for Sequence Modelling
Haotian Jiang, Qianxiao Li, Zhong Li, Shida Wang
TL;DR
The survey develops a unified view of sequence modelling through classical approximation theory, recasting architectures such as RNNs, temporal CNNs, encoder–decoder, and Transformers as hypothesis spaces for sequence-to-sequence functionals. It delineates universal approximation (density) results, and for several linear or memory-decaying settings, Jackson-type rate and Bernstein-type inverse results, highlighting how memory properties shape approximation efficiency. Key findings include density universality for RNNs under fading memory, Jackson-type rates tied to memory decay for linear RNNs, and a memory-structure-based comparison between RNNs and CNNs, with Transformer theory remaining largely open for rate results. The article also outlines practical goals (model selection and simplification) and mathematical directions (defining sequence-approximation spaces and exploring optimization/generalization in the sequential regime). Together, these results provide a blueprint for developing a principled theory of sequence modelling and guiding architecture choice in practice.
Abstract
We survey current developments in the approximation theory of sequence modelling in machine learning. Particular emphasis is placed on classifying existing results for various model architectures through the lens of classical approximation paradigms, and the insights one can gain from these results. We also outline some future research directions towards building a theory of sequence modelling.
