Table of Contents
Fetching ...

Self-Supervised Contrastive Learning for Long-term Forecasting

Junwoo Park, Daehoon Gwak, Jaegul Choo, Edward Choi

TL;DR

This paper introduces a novel approach that overcomes this limitation by employing contrastive learning and enhanced decomposition architecture, specifically designed to focus on long-term variations, and significantly improves long-term forecasting performance.

Abstract

Long-term forecasting presents unique challenges due to the time and memory complexity of handling long sequences. Existing methods, which rely on sliding windows to process long sequences, struggle to effectively capture long-term variations that are partially caught within the short window (i.e., outer-window variations). In this paper, we introduce a novel approach that overcomes this limitation by employing contrastive learning and enhanced decomposition architecture, specifically designed to focus on long-term variations. To this end, our contrastive loss incorporates global autocorrelation held in the whole time series, which facilitates the construction of positive and negative pairs in a self-supervised manner. When combined with our decomposition networks, our contrastive learning significantly improves long-term forecasting performance. Extensive experiments demonstrate that our approach outperforms 14 baseline models in multiple experiments over nine long-term benchmarks, especially in challenging scenarios that require a significantly long output for forecasting. Source code is available at https://github.com/junwoopark92/Self-Supervised-Contrastive-Forecsating.

Self-Supervised Contrastive Learning for Long-term Forecasting

TL;DR

This paper introduces a novel approach that overcomes this limitation by employing contrastive learning and enhanced decomposition architecture, specifically designed to focus on long-term variations, and significantly improves long-term forecasting performance.

Abstract

Long-term forecasting presents unique challenges due to the time and memory complexity of handling long sequences. Existing methods, which rely on sliding windows to process long sequences, struggle to effectively capture long-term variations that are partially caught within the short window (i.e., outer-window variations). In this paper, we introduce a novel approach that overcomes this limitation by employing contrastive learning and enhanced decomposition architecture, specifically designed to focus on long-term variations. To this end, our contrastive loss incorporates global autocorrelation held in the whole time series, which facilitates the construction of positive and negative pairs in a self-supervised manner. When combined with our decomposition networks, our contrastive learning significantly improves long-term forecasting performance. Extensive experiments demonstrate that our approach outperforms 14 baseline models in multiple experiments over nine long-term benchmarks, especially in challenging scenarios that require a significantly long output for forecasting. Source code is available at https://github.com/junwoopark92/Self-Supervised-Contrastive-Forecsating.
Paper Structure (30 sections, 11 equations, 17 figures, 11 tables, 1 algorithm)

This paper contains 30 sections, 11 equations, 17 figures, 11 tables, 1 algorithm.

Figures (17)

  • Figure 1: Long-term variations span beyond the conventional window. There are non-zero correlations (Left Y axis) with longer lags, and Fourier components (Right Y axis) with longer periods than the window size.
  • Figure 2: (Top) Electricity time series including a long-term variation beyond window size. (Bottom) Plotted representation similarities of four models between an anchor window $\mathcal{W}_2$ and all other windows including $\mathcal{W}_1$ and $\mathcal{W}_3$. To clearly highlight long-term correlation, we smoothed fluctuations caused by short-term correlation. The details of the visualization are found in Appendix \ref{['sapx:reprsim']}. Even though $\mathcal{W}_2$ have a similar temporal pattern with $\mathcal{W}_1$ due to yearly-long periodicity, three models, except for Ours, fail to learn this periodicity as the representation. The three models result in nearly identical cosine similarity scores (i.e.,$Sim(\mathcal{W}_2, \mathcal{W}_1) \approx Sim(\mathcal{W}_2, \mathcal{W}_3)$) between two representations of input parts within each window. This contributes to our model showing lower mean squared errors (0.275) in long-term predictions than PatchTST (0.332) and TimesNet (0.417).
  • Figure 3: Example of the relative selection strategy in our AutoCon. Three windows are sampled from different times $t_1$, $t_2$, and $t_3$ on the entire series to make up the batch. In this batch, there are a total of three possible positive pairs (i.e., due to three anchors). Each pair calculates a global autocorrelation whose lag is the time distance of the two windows constituting the pair. Then, by comparing the autocorrelation with other remaining pairs, the pairs with lower autocorrelation than the anchor positive pair are designated as negative pairs.
  • Figure 4: An overview of the redesigned architecture for long-term representation and forecasting
  • Figure 5: The outer-window autocorrelation exists in varying degrees in four datasets.
  • ...and 12 more figures