Position: Universal Time Series Foundation Models Rest on a Category Error

Xilin Dai; Wanxu Cai; Zhijian Xu; Qiang Xu

Position: Universal Time Series Foundation Models Rest on a Category Error

Xilin Dai, Wanxu Cai, Zhijian Xu, Qiang Xu

TL;DR

The paper argues that universal time-series foundation models embody a category error by treating diverse time-series domains as a single modality, which leads to negative transfer and reliance on expensive Generic Filters. It formalizes this with the Autoregressive Blindness Bound, showing history-only predictors cannot anticipate unobserved interventions $U_t$, and introduces the Causal Control Agent architecture (Perceiver-Controller-Solver) that leverages external context via a JIT Solver to handle regime shifts. It advocates shifting benchmarks from zero-shot accuracy to drift-adaptation metrics like Time-to-Recovery (TTR) and Intervention Recall, and it promotes architectural separation, using Perceivers for perception and Agents for reasoning, guided by a control-theoretic mindset. The proposed paradigm aims to produce robust forecasting systems that continually adapt to interventions, rather than rely on static, universal priors, with practical impact on critical infrastructures and high-stakes deployments.

Abstract

This position paper argues that the pursuit of "Universal Foundation Models for Time Series" rests on a fundamental category error, mistaking a structural Container for a semantic Modality. We contend that because time series hold incompatible generative processes (e.g., finance vs. fluid dynamics), monolithic models degenerate into expensive "Generic Filters" that fail to generalize under distributional drift. To address this, we introduce the "Autoregressive Blindness Bound," a theoretical limit proving that history-only models cannot predict intervention-driven regime shifts. We advocate replacing universality with a Causal Control Agent paradigm, where an agent leverages external context to orchestrate a hierarchy of specialized solvers, from frozen domain experts to lightweight Just-in-Time adaptors. We conclude by calling for a shift in benchmarks from "Zero-Shot Accuracy" to "Drift Adaptation Speed" to prioritize robust, control-theoretic systems.

Position: Universal Time Series Foundation Models Rest on a Category Error

TL;DR

, and introduces the Causal Control Agent architecture (Perceiver-Controller-Solver) that leverages external context via a JIT Solver to handle regime shifts. It advocates shifting benchmarks from zero-shot accuracy to drift-adaptation metrics like Time-to-Recovery (TTR) and Intervention Recall, and it promotes architectural separation, using Perceivers for perception and Agents for reasoning, guided by a control-theoretic mindset. The proposed paradigm aims to produce robust forecasting systems that continually adapt to interventions, rather than rely on static, universal priors, with practical impact on critical infrastructures and high-stakes deployments.

Abstract

Paper Structure (40 sections, 1 theorem, 7 equations, 4 figures, 1 table)

This paper contains 40 sections, 1 theorem, 7 equations, 4 figures, 1 table.

Introduction: The Universal Prior Fallacy
Related Works
Foundation Models for Time Series
Debate of Model Architectures
Distributional Discrepancy
Novel Paradigms
From Multimodal Forecasting to Causal Inference
Test-Time Training (TTT)
Agents for Forecasting
Argument I: The Category Error
Modality vs. Data Type
A Formal, Group-Theoretic Proof
Sample Complexity Lower Bound
"No Free Lunch" and Incompressibility
The "Generic Filter" Hypothesis
...and 25 more sections

Key Result

Theorem 3.3

To learn a universal function class $\mathcal{F}$ over a data space treated as a Container (i.e., with a trivial shared symmetry group $G=\{e\}$), the number of samples $N$ required to achieve an error $\epsilon$ has a worst-case exponential dependency on the ambient dimension $d$:

Figures (4)

Figure 1: The Category Error: Time series as a structural container. Disparate generative processes (e.g., finance, biology, physics) are treated as a single modality, leading to negative transfer when a monolithic model is trained on their union.
Figure 2: The Generic Filter Hypothesis illustrated by attention collapse. A large Transformer's attention mechanism degenerates from a potentially complex pattern into a simple diagonal band, effectively mimicking a moving average filter.
Figure 3: The Autoregressive Blindness Bound. A history-only model ($f(X_h)$) can only extrapolate past patterns. It is blind to the external intervention ($U_t$, e.g., a news event) that causes the true system dynamics to diverge, leading to an irreducible error.
Figure 4: Conceptual architecture of the Causal Control Agent. The Perceiver processes external data to form an intervention signal $U_t$, which the Controller uses to select and modulate a JIT Solver.

Theorems & Definitions (3)

Definition 3.1: Container (Data Type)
Definition 3.2: Modality
Theorem 3.3: Sample Complexity Lower Bound for Universal Approximation

Position: Universal Time Series Foundation Models Rest on a Category Error

TL;DR

Abstract

Position: Universal Time Series Foundation Models Rest on a Category Error

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (3)