Distillation Enhanced Time Series Forecasting Network with Momentum Contrastive Learning

Haozhi Gao; Qianqian Ren; Jinbao Li

Distillation Enhanced Time Series Forecasting Network with Momentum Contrastive Learning

Haozhi Gao, Qianqian Ren, Jinbao Li

TL;DR

DE-TSMCL, an innovative distillation enhanced framework for long sequence time series forecasting is proposed, which adaptively learns whether to mask a timestamp to obtain optimized sub-sequences and a supervised task to learn more robust representations and facilitate the contrastive learning process.

Abstract

Contrastive representation learning is crucial in time series analysis as it alleviates the issue of data noise and incompleteness as well as sparsity of supervision signal. However, existing constrastive learning frameworks usually focus on intral-temporal features, which fails to fully exploit the intricate nature of time series data. To address this issue, we propose DE-TSMCL, an innovative distillation enhanced framework for long sequence time series forecasting. Specifically, we design a learnable data augmentation mechanism which adaptively learns whether to mask a timestamp to obtain optimized sub-sequences. Then, we propose a contrastive learning task with momentum update to explore inter-sample and intra-temporal correlations of time series to learn the underlying structure feature on the unlabeled time series. Meanwhile, we design a supervised task to learn more robust representations and facilitate the contrastive learning process. Finally, we jointly optimize the above two tasks. By developing model loss from multiple tasks, we can learn effective representations for downstream forecasting task. Extensive experiments, in comparison with state-of-the-arts, well demonstrate the effectiveness of DE-TSMCL, where the maximum improvement can reach to 27.3%.

Distillation Enhanced Time Series Forecasting Network with Momentum Contrastive Learning

TL;DR

Abstract

Paper Structure (37 sections, 18 equations, 10 figures, 10 tables, 1 algorithm)

This paper contains 37 sections, 18 equations, 10 figures, 10 tables, 1 algorithm.

Introduction
Related Work
Time Series Forecasting
Contrastive Learning
Knowledge Distillation
Preliminary
Problem Statement
Momentum Contrastive Learning
METHODOLOGY
Overall Framework
Data Processing and Augmentation
Dual-cropping
Projection Head
Learnable Data Augmentation
Representation Learning with Knowledge Distillation
...and 22 more sections

Figures (10)

Figure 1: The overall architecture of DE-TSMCL. It consists of four major components: data augmentation, representation learning, supervised task, and self-supervised task.
Figure 2: The design of the encoder, where the sequence follows GELU-DilatedConv-GELU-DilatedConv structure.
Figure 3: The effect of each component of DE-TSMCL for univariate time series forecasting.
Figure 4: The effect of each component of DE-TSMCL for multivariate time series forecasting.
Figure 5: The impact of $\lambda$ on four different datasets for univariate time series forecasting.
...and 5 more figures

Distillation Enhanced Time Series Forecasting Network with Momentum Contrastive Learning

TL;DR

Abstract

Distillation Enhanced Time Series Forecasting Network with Momentum Contrastive Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (10)