Generative Pre-Trained Diffusion Paradigm for Zero-Shot Time Series Forecasting

Jiarui Yang; Tao Dai; Naiqi Li; Junxi Wu; Peiyuan Liu; Jinmin Li; Jigang Bao; Haigang Zhang; Shutao Xia

Generative Pre-Trained Diffusion Paradigm for Zero-Shot Time Series Forecasting

Jiarui Yang, Tao Dai, Naiqi Li, Junxi Wu, Peiyuan Liu, Jinmin Li, Jigang Bao, Haigang Zhang, Shutao Xia

TL;DR

The paper introduces Generative Pre-trained Diffusion (GPD), a zero-shot time-series forecasting paradigm that treats time-series data with an unconditional diffusion model trained as a foundation model. By prompting with historical sequences and using posterior sampling, GPD can generate flexible future horizons without task-specific fine-tuning, achieving competitive performance against state-of-the-art LLM-based and diffusion-based methods across cross-domain, long-term, short-term, and zero-shot tasks. Key contributions include a simple MLP diffusion backbone, a tuning-free zero-shot forecasting mechanism, and thorough experiments on nine real-world datasets demonstrating generalization and robustness. This approach offers a unified, interpretable diffusion-based foundation model for time series, capable of handling arbitrary history and forecast lengths while mitigating concept drift.

Abstract

In recent years, generative pre-trained paradigms such as Large Language Models (LLMs) and Large Vision Models (LVMs) have achieved revolutionary advancements and widespread real-world applications. Particularly, the emergence of pre-trained LLMs-based temporal works, compared to previous deep model approaches, has demonstrated superior generalization and robustness, showcasing the potential of generative pre-trained paradigms as foundation models for time series. However, those LLMs-based works mainly focus on cross-modal research, i.e., leveraging the language capabilities of LLMs in time series contexts. Although they have achieved impressive performance, there still exist the issues of concept drift caused by differences in data distribution and inflexibility caused by misalignment of dimensions. To this end, inspired by recent work on LVMs, we reconsider the paradigm of time series modeling. In this paper, we comprehensively explore, for the first time, the effectiveness and superiority of the Generative Pre-trained Diffusion (GPD) paradigm in real-world multivariate time series forecasting (TSF). Specifically, to mitigate performance bias introduced by sophisticated networks, we propose a straightforward MLP diffusion network for unconditional modeling of time series. Then we employ a zero-shot and tuning-free method to predict (generate) future data using historical data as prompts. The GPD paradigm is established on the time series modality, effectively preventing the phenomenon of concept drift, and enabling flexible forecasting of arbitrary lengths. We demonstrate that the GPD paradigm achieves comprehensive performance and generalization comparable to current SOTA LLM-based and deep model paradigms on mainstream benchmarks and various TSF tasks. Extensive experiments validate the potential of the GPD paradigm and its assistance in future related research.

Generative Pre-Trained Diffusion Paradigm for Zero-Shot Time Series Forecasting

TL;DR

Abstract

Paper Structure (21 sections, 9 equations, 11 figures, 8 tables)

This paper contains 21 sections, 9 equations, 11 figures, 8 tables.

Introduction
Method
Preliminary
Pre-Trained Diffusion Model
Zero-Shot Prompt Forecasting
Experiments
Datasets and Baselines
Implementation Details
Cross-Domain Time Series Forecasting
Long and Short-Term Time Series Forecasting
Zero-Shot Forecasting
Model Analysis
Related Works
Large Language Models for Time Series
Diffusion Models for Time Series
...and 6 more sections

Figures (11)

Figure 1: (a) The exclusive model establishes a singular mapping from history to future within a specific domain. (b) The unified model leverages the capabilities of LLMs and domain-specific textual instructions to differentiate and construct complex mappings. (c) The diffusion model establishes global underlying statistical characteristics across different domains. Note that paradigms (a) and (b) require retraining when altering the length of historical context. The diffusion paradigm offers flexibility in adjusting the length of historical prompts.
Figure 2: Comparison of comprehensive performance of different paradigms.
Figure 3: t-SNE Visualization of GPD modeling capability.
Figure 4: The impact of different sampling step settings and prediction strategies on forecasting performance. $>>1$ indicates that the metric is far greater than one.
Figure 5: Quantitative comparison of different diffusion-based methods in short-term forecasting
...and 6 more figures

Generative Pre-Trained Diffusion Paradigm for Zero-Shot Time Series Forecasting

TL;DR

Abstract

Generative Pre-Trained Diffusion Paradigm for Zero-Shot Time Series Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (11)