Vocabulary for Universal Approximation: A Linguistic Perspective of Mapping Compositions
Yongqiang Cai
TL;DR
The paper addresses universal approximation by replacing learnable weights with a finite vocabulary of flow maps. It develops a constructive theory showing that a finite set $V$ of flow maps, with $|V|=O(d^2)$, suffices to approximate any continuous map on a compact domain via compositions, and similarly yields $C$-UAP for orientation-preserving diffeomorphisms under $L^p$ norms. The approach combines dynamical-systems concepts (affine and leaky-ReLU flow maps), the Lie product formula, and Kronecker's approximation to realize arbitrary flows through finite compositions, and then proves a two-part construction that leads to an explicit $V$. Beyond function approximation, the work introduces a compositional flow-language model (CFSM) that can represent regular languages via flow grammars, providing a bridge between formal language theory and continuous-time mappings. The results offer a new perspective on compositionality in machine learning and NLP, suggesting that meaningful sentence meanings could be embedded as nonlinear mappings and composed from a finite, well-structured vocabulary of flow maps.
Abstract
In recent years, deep learning-based sequence modelings, such as language models, have received much attention and success, which pushes researchers to explore the possibility of transforming non-sequential problems into a sequential form. Following this thought, deep neural networks can be represented as composite functions of a sequence of mappings, linear or nonlinear, where each composition can be viewed as a \emph{word}. However, the weights of linear mappings are undetermined and hence require an infinite number of words. In this article, we investigate the finite case and constructively prove the existence of a finite \emph{vocabulary} $V=\{φ_i: \mathbb{R}^d \to \mathbb{R}^d | i=1,...,n\}$ with $n=O(d^2)$ for the universal approximation. That is, for any continuous mapping $f: \mathbb{R}^d \to \mathbb{R}^d$, compact domain $Ω$ and $\varepsilon>0$, there is a sequence of mappings $φ_{i_1}, ..., φ_{i_m} \in V, m \in \mathbb{Z}_+$, such that the composition $φ_{i_m} \circ ... \circ φ_{i_1} $ approximates $f$ on $Ω$ with an error less than $\varepsilon$. Our results demonstrate an unusual approximation power of mapping compositions and motivate a novel compositional model for regular languages.
