TimeDiT: General-purpose Diffusion Transformers for Time Series Foundation Model
Defu Cao, Wen Ye, Yizhou Zhang, Yan Liu
TL;DR
The paper tackles the need for a generalizable foundation model for time series that can handle missing data, multi-resolution sampling, and uncertainty. It proposes TimeDiT, a diffusion-transformer framework with a unified masking scheme and physics-informed sampling that injects PDE priors during inference. TimeDiT demonstrates strong zero-shot and fine-tuned performance across forecasting, imputation, anomaly detection, and data generation, with notable gains in uncertainty quantification and domain knowledge integration. By serving as a proto-foundation model, TimeDiT bridges the gap between universal temporal modeling and domain-specific needs, offering efficient sampling and flexible integration of external knowledge.
Abstract
Foundation models, particularly Large Language Models (LLMs), have revolutionized text and video processing, yet time series data presents distinct challenges for such approaches due to domain-specific features such as missing values, multi-resolution characteristics, etc. Furthermore, the de-facto autoregressive transformers tend to learn deterministic temporal dependencies within pre-trained data while overlooking inherent uncertainties and lacking integration of physical constraints. In this paper, we introduce TimeDiT, a diffusion transformer model that synergistically combines transformer-based temporal dependency learning with diffusion-based probabilistic sampling. TimeDiT employs a unified masking mechanism to harmonize the training and inference process across diverse tasks while introducing a theoretically grounded, finetuning-free model editing strategy that enables flexible integration of external knowledge during sampling. Acknowledging the challenges of unifying multiple downstream tasks under a single model, our systematic evaluation demonstrates TimeDiT's effectiveness both in fundamental tasks, i.e., forecasting and imputation, through zero-shot/fine-tuning; and in domain tasks, i.e., multi-resolution forecasting, anomaly detection, and data generation, establishing it as a \textit{proto-foundation model} that bridges the gap between general-purpose and domain-specific models.
