Table of Contents
Fetching ...

KEDformer:Knowledge Extraction Seasonal Trend Decomposition for Long-term Sequence Prediction

Zhenkai Qin, Baozhong Wei, Caifeng Gao, Jianyuan Ni

TL;DR

KEDformer addresses the challenge of long-horizon time series forecasting by marrying a knowledge extraction mechanism with seasonal-trend decomposition. It introduces Knowledge Extraction Attention (KEDA) to reduce self-attention complexity from $O(L^2)$ to $O(L \log L)$ and employs MSTWDecomp to separate seasonal and trend components, enabling more accurate modeling of both short-term fluctuations and long-term patterns. The framework uses KL-based distillation and a distillation score $M(q_i,K)$ to selectively emphasize informative query-key pairs, with decoupled processing in encoder and decoder. Across five public datasets, KEDformer outperforms established Transformer-based models and demonstrates favorable efficiency, especially for long sequences, while Ablation studies validate the superiority of the KEDA and decomposition components. This approach offers a practical and scalable solution for long-term forecasting in domains such as energy, transport, and weather.

Abstract

Time series forecasting is a critical task in domains such as energy, finance, and meteorology, where accurate long-term predictions are essential. While Transformer-based models have shown promise in capturing temporal dependencies, their application to extended sequences is limited by computational inefficiencies and limited generalization. In this study, we propose KEDformer, a knowledge extraction-driven framework that integrates seasonal-trend decomposition to address these challenges. KEDformer leverages knowledge extraction methods that focus on the most informative weights within the self-attention mechanism to reduce computational overhead. Additionally, the proposed KEDformer framework decouples time series into seasonal and trend components. This decomposition enhances the model's ability to capture both short-term fluctuations and long-term patterns. Extensive experiments on five public datasets from energy, transportation, and weather domains demonstrate the effectiveness and competitiveness of KEDformer, providing an efficient solution for long-term time series forecasting.

KEDformer:Knowledge Extraction Seasonal Trend Decomposition for Long-term Sequence Prediction

TL;DR

KEDformer addresses the challenge of long-horizon time series forecasting by marrying a knowledge extraction mechanism with seasonal-trend decomposition. It introduces Knowledge Extraction Attention (KEDA) to reduce self-attention complexity from to and employs MSTWDecomp to separate seasonal and trend components, enabling more accurate modeling of both short-term fluctuations and long-term patterns. The framework uses KL-based distillation and a distillation score to selectively emphasize informative query-key pairs, with decoupled processing in encoder and decoder. Across five public datasets, KEDformer outperforms established Transformer-based models and demonstrates favorable efficiency, especially for long sequences, while Ablation studies validate the superiority of the KEDA and decomposition components. This approach offers a practical and scalable solution for long-term forecasting in domains such as energy, transport, and weather.

Abstract

Time series forecasting is a critical task in domains such as energy, finance, and meteorology, where accurate long-term predictions are essential. While Transformer-based models have shown promise in capturing temporal dependencies, their application to extended sequences is limited by computational inefficiencies and limited generalization. In this study, we propose KEDformer, a knowledge extraction-driven framework that integrates seasonal-trend decomposition to address these challenges. KEDformer leverages knowledge extraction methods that focus on the most informative weights within the self-attention mechanism to reduce computational overhead. Additionally, the proposed KEDformer framework decouples time series into seasonal and trend components. This decomposition enhances the model's ability to capture both short-term fluctuations and long-term patterns. Extensive experiments on five public datasets from energy, transportation, and weather domains demonstrate the effectiveness and competitiveness of KEDformer, providing an efficient solution for long-term time series forecasting.

Paper Structure

This paper contains 22 sections, 24 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Schematic overview of the proposed KEDformer method. Initially, the Knowledge Extraction Attention module (KEDA, blue block) is designed to reduce model parameters by using self-correlation and sparse attention mechanisms. More specifically, the self-correlation mechanism estimates the correlation of subsequences within a specific time period, while sparse attention is employed to filter the weight matrix of these correlations. After that, the time series decomposition (MSTW, yellow block) method is used to extract seasonal and trend patterns from the input time series data.
  • Figure 2: Visualization of time series decomposition. In the left subfigure (a), the raw time series data is shown without decomposition, displaying interwoven fluctuations and trends. In contrast, the right subfigure (b) presents the time series decomposed into three components: the original time series in purple, the trend-cyclical component in beige, and the seasonal component in teal. Using data from the ETTm1 dataset, this decomposition reveals distinct seasonal and trend-cyclical patterns, enabling the model to better capture both periodic variations and long-term trends within the data.
  • Figure 3: The results of the time series decomposition experiment are presented. In a comparative experiment that controls the number of KEDformer mechanisms during the encoding and decoding processes, we set the input length $I=96$ and the prediction lengths $O \in \{96, 192, 336\}$. The evaluation metrics used are Mean Squared Error (MSE) and Mean Absolute Error (MAE), with lower values indicating better model performance.
  • Figure 4: The number of KEDformer mechanisms and their impact on the computational efficiency of the model are evaluated by controlling the number of mechanisms. The input length is set to $I=96$, and the prediction steps are $O=\{512, 720, 1024, 1280\}$. The time required for each epoch is used as an indicator of the model's computational speed.
  • Figure 5: In the experiment for model computational efficiency and performance analysis, four different models are used to perform long-term time series forecasting tasks on the Exchange dataset. The input length is set to $I=96$, and the prediction lengths are $O \in \{96, 192, 336, 720\}$. The metric for evaluating computational efficiency is the time (in seconds) taken by each model to compute one epoch, while the performance metrics are the Mean Squared Error (MSE) and Mean Absolute Error (MAE).