Table of Contents
Fetching ...

Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting

Yifan Hu, Jie Yang, Tian Zhou, Peiyuan Liu, Yujin Tang, Rong Jin, Liang Sun

TL;DR

TimeAlign tackles distribution gaps between historical inputs and future targets in time series forecasting by a dual-branch framework that couples a prediction path with a target-aligned reconstruction path and enforces distribution-aware alignment. The approach preserves both low-frequency and high-frequency dynamics and provides theoretical guarantees: reconstruction improves forecasting generalization and alignment increases mutual information between learned representations and future targets. Extensive experiments across eight benchmarks establish state-of-the-art performance and demonstrate plug-and-play applicability with various forecasters. The work offers a practical, principled route to enhance robustness under distribution shifts in TSF.

Abstract

Although contrastive and other representation-learning methods have long been explored in vision and NLP, their adoption in modern time series forecasters remains limited. We believe they hold strong promise for this domain. To unlock this potential, we explicitly align past and future representations, thereby bridging the distributional gap between input histories and future targets. To this end, we introduce TimeAlign, a lightweight, plug-and-play framework that establishes a new representation paradigm, distinct from contrastive learning, by aligning auxiliary features via a simple reconstruction task and feeding them back into any base forecaster. Extensive experiments across eight benchmarks verify its superior performance. Further studies indicate that the gains arise primarily from correcting frequency mismatches between historical inputs and future outputs. Additionally, we provide two theoretical justifications for how reconstruction improves forecasting generalization and how alignment increases the mutual information between learned representations and predicted targets. The code is available at https://github.com/TROUBADOUR000/TimeAlign.

Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting

TL;DR

TimeAlign tackles distribution gaps between historical inputs and future targets in time series forecasting by a dual-branch framework that couples a prediction path with a target-aligned reconstruction path and enforces distribution-aware alignment. The approach preserves both low-frequency and high-frequency dynamics and provides theoretical guarantees: reconstruction improves forecasting generalization and alignment increases mutual information between learned representations and future targets. Extensive experiments across eight benchmarks establish state-of-the-art performance and demonstrate plug-and-play applicability with various forecasters. The work offers a practical, principled route to enhance robustness under distribution shifts in TSF.

Abstract

Although contrastive and other representation-learning methods have long been explored in vision and NLP, their adoption in modern time series forecasters remains limited. We believe they hold strong promise for this domain. To unlock this potential, we explicitly align past and future representations, thereby bridging the distributional gap between input histories and future targets. To this end, we introduce TimeAlign, a lightweight, plug-and-play framework that establishes a new representation paradigm, distinct from contrastive learning, by aligning auxiliary features via a simple reconstruction task and feeding them back into any base forecaster. Extensive experiments across eight benchmarks verify its superior performance. Further studies indicate that the gains arise primarily from correcting frequency mismatches between historical inputs and future outputs. Additionally, we provide two theoretical justifications for how reconstruction improves forecasting generalization and how alignment increases the mutual information between learned representations and predicted targets. The code is available at https://github.com/TROUBADOUR000/TimeAlign.

Paper Structure

This paper contains 47 sections, 44 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Comparison among history, ground truth, forecast with and without alignment using different visualization perspectives. (a) Time Series Visualization. (b) Token (Patch)-wise Cosine Similarity. (c) Spectrogram Map. (d) t-SNE Embedding Space. More examples are in \ref{['app:vis']}.
  • Figure 2: (a) The original paradigm of deep learning forecasters. Distributions are extracted from history and mapped to the prediction space. (b) The paradigm of TimeAlign. Joint optimization of the predict and reconstruct branches provides distributional alignment. (c) High-frequency energy ratio. For different datasets, the threshold between high- and low-frequency bands is determined adaptively via knee-point detection of the cumulative energy distribution. (d) High-frequency similarity. Pearson correlations are computed on the high-frequency components between the ground truth and the history, the forecast with and without alignment. IMP. means improvement.
  • Figure 3: Overall architecture of TimeAlign. (i) Predict Branch maps history to forecasts (both training and inference), with a replaceable backbone. (ii) Reconstruct Branch reconstructs targets to capture the distribution (training-only). (iii) Distribution-Aware Alignment aligns predict and reconstruct representations via global and local mechanisms. (iv) A Simple Encoder is the default lightweight design in Reconstruct Branch and default Predict Branch.
  • Figure 4: The t-SNE visualization illustrates the distribution of the history, the ground truth and the forecasts produced by TimeAlign, iTransformer itransformer, DLinear dlinear, iTransformer+TimeAlign and DLinear+TimeAlign on the ECL, Traffic, and ETTm1 datasets. The TimeAlign forecasts almost perfectly overlap with the ground truth manifold, whereas the predictions from vanilla iTransformer and DLinear exhibit obvious distributional divergence. Plugging in TimeAlign visibly collapses this gap, steering backbones toward the target distribution.
  • Figure 5: Left: Model efficiency comparison under ETTm2 and ECL datasets. Right: Training iteration vs. MSE plot. Model training becomes more efficient and effective.
  • ...and 4 more figures