Table of Contents
Fetching ...

Unlocking the Power of LSTM for Long Term Time Series Forecasting

Yaxuan Kong, Zepu Wang, Yuqi Nie, Tian Zhou, Stefan Zohren, Yuxuan Liang, Peng Sun, Qingsong Wen

TL;DR

The paper tackles long-term multivariate time series forecasting by revisiting LSTM-based models through the sLSTM framework and introducing P-sLSTM, which adds patching and channel independence to address short memory and overfitting. It provides theoretical grounding via Markov-chain ergodicity analysis, showing conditions under which memory is controlled, and demonstrates that patching can compensate for inherent memory limitations of sLSTM. Empirically, P-sLSTM achieves state-of-the-art or competitive performance across Weather, Electricity, Solar, ETTm1, and PEMS03 datasets, often outperforming LSTM and sLSTM while maintaining efficiency. The work highlights that carefully designed RNN-based architectures can rival transformer-based approaches for TSF and offers practical guidance on memory control, patching, and channel-wise design.

Abstract

Traditional recurrent neural network architectures, such as long short-term memory neural networks (LSTM), have historically held a prominent role in time series forecasting (TSF) tasks. While the recently introduced sLSTM for Natural Language Processing (NLP) introduces exponential gating and memory mixing that are beneficial for long term sequential learning, its potential short memory issue is a barrier to applying sLSTM directly in TSF. To address this, we propose a simple yet efficient algorithm named P-sLSTM, which is built upon sLSTM by incorporating patching and channel independence. These modifications substantially enhance sLSTM's performance in TSF, achieving state-of-the-art results. Furthermore, we provide theoretical justifications for our design, and conduct extensive comparative and analytical experiments to fully validate the efficiency and superior performance of our model.

Unlocking the Power of LSTM for Long Term Time Series Forecasting

TL;DR

The paper tackles long-term multivariate time series forecasting by revisiting LSTM-based models through the sLSTM framework and introducing P-sLSTM, which adds patching and channel independence to address short memory and overfitting. It provides theoretical grounding via Markov-chain ergodicity analysis, showing conditions under which memory is controlled, and demonstrates that patching can compensate for inherent memory limitations of sLSTM. Empirically, P-sLSTM achieves state-of-the-art or competitive performance across Weather, Electricity, Solar, ETTm1, and PEMS03 datasets, often outperforming LSTM and sLSTM while maintaining efficiency. The work highlights that carefully designed RNN-based architectures can rival transformer-based approaches for TSF and offers practical guidance on memory control, patching, and channel-wise design.

Abstract

Traditional recurrent neural network architectures, such as long short-term memory neural networks (LSTM), have historically held a prominent role in time series forecasting (TSF) tasks. While the recently introduced sLSTM for Natural Language Processing (NLP) introduces exponential gating and memory mixing that are beneficial for long term sequential learning, its potential short memory issue is a barrier to applying sLSTM directly in TSF. To address this, we propose a simple yet efficient algorithm named P-sLSTM, which is built upon sLSTM by incorporating patching and channel independence. These modifications substantially enhance sLSTM's performance in TSF, achieving state-of-the-art results. Furthermore, we provide theoretical justifications for our design, and conduct extensive comparative and analytical experiments to fully validate the efficiency and superior performance of our model.
Paper Structure (29 sections, 7 equations, 3 figures, 5 tables)

This paper contains 29 sections, 7 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of P-sLSTM Architecture (Top Left: sLSTM structure; Bottom Left: sLSTM block beck2024xlstm).
  • Figure 2: Exploration of the patch size on MSE results of P-sLSTM on Weather dataset.
  • Figure 3: The MSE results (Y-axis) of models with different look-back window sizes (X-axis) of long-term forecasting (T=720) on Weather dataset.