Mitigating Data Redundancy to Revitalize Transformer-based Long-Term Time Series Forecasting System

Mingjie Li; Rui Liu; Guangsi Shi; Mingfei Han; Changling Li; Lina Yao; Xiaojun Chang; Ling Chen

Mitigating Data Redundancy to Revitalize Transformer-based Long-Term Time Series Forecasting System

Mingjie Li, Rui Liu, Guangsi Shi, Mingfei Han, Changling Li, Lina Yao, Xiaojun Chang, Ling Chen

TL;DR

This work tackles data redundancy in rolling-window long-term time-series forecasting by introducing CLMFormer, a Transformer-based framework that combines curriculum-learning-driven noise with a memory-driven decoder to diversify training samples. The method deploys a progressive dropout schedule and a seasonal memory module, including Memory-driven Conditional Layer Normalization and a Seasonal Memory Matrix, to enhance pattern recognition and capture seasonality in highly similar data. Extensive experiments on six real-world benchmarks show up to 30% improvements over strong Transformer baselines, with pronounced gains for longer prediction horizons and when integrated with state-of-the-art models like FEDformer. The approach is demonstrated to be broadly compatible with existing LTSF architectures and offers a practical path toward more robust, long-range forecasts in domains with limited diverse training data.

Abstract

Long-term time-series forecasting (LTSF) is fundamental to various real-world applications, where Transformer-based models have become the dominant framework due to their ability to capture long-range dependencies. However, these models often experience overfitting due to data redundancy in rolling forecasting settings, limiting their generalization ability particularly evident in longer sequences with highly similar adjacent data. In this work, we introduce CLMFormer, a novel framework that mitigates redundancy through curriculum learning and a memory-driven decoder. Specifically, we progressively introduce Bernoulli noise to the training samples, which effectively breaks the high similarity between adjacent data points. This curriculum-driven noise introduction aids the memory-driven decoder by supplying more diverse and representative training data, enhancing the decoder's ability to model seasonal tendencies and dependencies in the time-series data. To further enhance forecasting accuracy, we introduce a memory-driven decoder. This component enables the model to capture seasonal tendencies and dependencies in the time-series data and leverages temporal relationships to facilitate the forecasting process. Extensive experiments on six real-world LTSF benchmarks show that CLMFormer consistently improves Transformer-based models by up to 30%, demonstrating its effectiveness in long-horizon forecasting.

Mitigating Data Redundancy to Revitalize Transformer-based Long-Term Time Series Forecasting System

TL;DR

Abstract

Paper Structure (28 sections, 13 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 28 sections, 13 equations, 5 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Time Series Forecasting
Transformer-based LTSF Systems
Methodology
Base Model
Input Embedding
Encoder-Decoder Architecture
Loss Function
Progressive Training Strategy
Seasonal Memory-driven Forecasting
Memory-driven Conditional Layer Normalization
Seasonal Memory Matrix
Gate Mechanism
Experiments
...and 13 more sections

Figures (5)

Figure 1: Illustration of the rolling forecasting setting with stride size=1 and window size=6.
Figure 2: An overview of the model architecture. $M_{t-1}$ represents the memory matrix from the previous prediction, $M_{t}$ represents the updated memory matrix during the current sequence prediction, which in turn is recycled to be the $M_{t-1}$ for the next prediction.
Figure 3: The interaction between curriculum learning and memory-driven mechanism.
Figure 4: Detailed architectures of the memory-driven conditional layer normalization layer (left, green box) and the gate mechanism (right, purple box).
Figure 5: Samples of time series data from the ETTh1, Weather, and Air Quality datasets, along with predictions by FedFormer and our CLMFormer. We employ multivariate settings with prediction lengths of 720 and visualize the OT, temperature and PM10 value for the ETTh1, Weather and Air Quality datasets, respectively.

Mitigating Data Redundancy to Revitalize Transformer-based Long-Term Time Series Forecasting System

TL;DR

Abstract

Mitigating Data Redundancy to Revitalize Transformer-based Long-Term Time Series Forecasting System

Authors

TL;DR

Abstract

Table of Contents

Figures (5)