Table of Contents
Fetching ...

DDTime: Dataset Distillation with Spectral Alignment and Information Bottleneck for Time-Series Forecasting

Yuqi Li, Kuiye Ding, Chuanguang Yang, Hao Wang, Haoxuan Wang, Huiran Duan, Junming Liu, Yingli Tian

TL;DR

DDTime tackles two core problems in time-series dataset distillation: autocorrelation-induced bias in value-term alignment and limited synthetic diversity. It introduces a frequency-domain value-term to decorrelate horizon components and an ISIB mechanism to maximize information density across synthetic trajectories, all within a plug-in compatible with first-order condensation. Empirically, DDTime delivers about 30% relative accuracy gains across 20 benchmarks with modest overhead, and often distilled subsets outperform full-data training under favorable conditions. The framework is architecture-agnostic, offering practical guidelines for synthetic data size, balance parameter alpha, and diversity weight, making it a robust, scalable approach for TSF data condensation.

Abstract

Time-series forecasting is fundamental across many domains, yet training accurate models often requires large-scale datasets and substantial computational resources. Dataset distillation offers a promising alternative by synthesizing compact datasets that preserve the learning behavior of full data. However, extending dataset distillation to time-series forecasting is non-trivial due to two fundamental challenges: 1.temporal bias from strong autocorrelation, which leads to distorted value-term alignment between teacher and student models; and 2.insufficient diversity among synthetic samples, arising from the absence of explicit categorical priors to regularize trajectory variety. In this work, we propose DDTime, a lightweight and plug-in distillation framework built upon first-order condensation decomposition. To tackle Challenge 1, it revisits value-term alignment through temporal statistics and introduces a frequency-domain alignment mechanism to mitigate autocorrelation-induced bias, ensuring spectral consistency and temporal fidelity. To address Challenge 2, we further design an inter-sample regularization inspired by the information bottleneck principle, which enhances diversity and maximizes information density across synthetic trajectories. The combined objective is theoretically compatible with a wide range of condensation paradigms and supports stable first-order optimization. Extensive experiments on 20 benchmark datasets and diverse forecasting architectures demonstrate that DDTime consistently outperforms existing distillation methods, achieving about 30% relative accuracy gains while introducing about 2.49% computational overhead. All code and distilled datasets will be released.

DDTime: Dataset Distillation with Spectral Alignment and Information Bottleneck for Time-Series Forecasting

TL;DR

DDTime tackles two core problems in time-series dataset distillation: autocorrelation-induced bias in value-term alignment and limited synthetic diversity. It introduces a frequency-domain value-term to decorrelate horizon components and an ISIB mechanism to maximize information density across synthetic trajectories, all within a plug-in compatible with first-order condensation. Empirically, DDTime delivers about 30% relative accuracy gains across 20 benchmarks with modest overhead, and often distilled subsets outperform full-data training under favorable conditions. The framework is architecture-agnostic, offering practical guidelines for synthetic data size, balance parameter alpha, and diversity weight, making it a robust, scalable approach for TSF data condensation.

Abstract

Time-series forecasting is fundamental across many domains, yet training accurate models often requires large-scale datasets and substantial computational resources. Dataset distillation offers a promising alternative by synthesizing compact datasets that preserve the learning behavior of full data. However, extending dataset distillation to time-series forecasting is non-trivial due to two fundamental challenges: 1.temporal bias from strong autocorrelation, which leads to distorted value-term alignment between teacher and student models; and 2.insufficient diversity among synthetic samples, arising from the absence of explicit categorical priors to regularize trajectory variety. In this work, we propose DDTime, a lightweight and plug-in distillation framework built upon first-order condensation decomposition. To tackle Challenge 1, it revisits value-term alignment through temporal statistics and introduces a frequency-domain alignment mechanism to mitigate autocorrelation-induced bias, ensuring spectral consistency and temporal fidelity. To address Challenge 2, we further design an inter-sample regularization inspired by the information bottleneck principle, which enhances diversity and maximizes information density across synthetic trajectories. The combined objective is theoretically compatible with a wide range of condensation paradigms and supports stable first-order optimization. Extensive experiments on 20 benchmark datasets and diverse forecasting architectures demonstrate that DDTime consistently outperforms existing distillation methods, achieving about 30% relative accuracy gains while introducing about 2.49% computational overhead. All code and distilled datasets will be released.

Paper Structure

This paper contains 35 sections, 2 theorems, 19 equations, 9 figures, 21 tables.

Key Result

Lemma 1

Under a first-order Taylor approximation of $M_\theta$ around $\theta_T$, and assuming Lipschitz continuity of $\ell$, the intractable test objective in Eq. eq:test-risk can be upper-bounded by two optimizable terms that depend only on quantities available during condensation: Where $\|\cdot\|$ is a suitable norm and the hidden constants depend on the local Lipschitz smoothness of $M_\theta$ and

Figures (9)

  • Figure 1: Left: Performance of DDTime. Average results (MSE) are reported following CondTSF ding2024condtsf, see details in Table \ref{['tab:average']}. Right: (a)&(b): Unlike previous distillation frameworks that operate purely in the temporal domain, our method introduces frequency-domain alignment and diversity regularization to enrich synthetic samples. (c) Quantitative comparison of data diversity shows that our method achieves consistently higher diversity scores than CondTSF ding2024condtsf and DATM guo2024datm.
  • Figure 2: Overall framework of the proposed DDTime. The framework consists of three key components: (i) a Data Diversity module that enhances inter-sample diversity by increasing the pairwise KL divergence among synthetic sequences, the rose plots on the left visualize the KL divergence magnitude of each sample as a sector, (ii) a Teacher–Student paradigm that enforces parameter matching between learners trained on real and synthetic data, and (iii) a joint Temporal–Frequency Alignment loss that balances time-domain and spectral-domain supervision via the weighting coefficient $\alpha$. The TSF network illustrated is the sDLinear architecture dlinear. Here, $\mathcal{F}(\cdot)$ denotes the discrete Fourier transform (DFT).
  • Figure 3: Visualization of synthetic time-series samples generated by DATM and DATM+Ours on the Electricity dataset. The horizontal-axis shows time step, the vertical-axis shows normalized time-series values; blue and red lines indicate inputs and outputs. DATM+Ours produces label segments with richer periodic structures and more natural local oscillations.
  • Figure 4: Parameter sensitivity analysis of different distillation baselines integrated with our ISIB-regularization. The coefficient $\lambda_{\mathrm{IS}}$ controls the strength of the data diversity loss in our method. Each curve shows how the forecasting error (MAE) changes with varying $\lambda_{\mathrm{IS}}$; curves are smoothed for clarity, and the optimal points are highlighted with red stars.
  • Figure 5: Visualization of Best (pink) and Second Best (blue) results across four representative datasets. Each block represents a specific sample size setting ($S = 3, 5, 10, 20$). Red-bordered regions correspond to the original method, while blue-bordered regions show the results of our frequency domain alignment variant. Gray cells indicate remaining cases.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Lemma 1: First-order condensation decomposition ding2024condtsf
  • Lemma 2: Decorrelation Property of FFT wang2025fredf