Table of Contents
Fetching ...

PENGUIN: Enhancing Transformer with Periodic-Nested Group Attention for Long-term Time Series Forecasting

Tian Sun, Yuqi Chen, Weiwei Sun

Abstract

Despite advances in the Transformer architecture, their effectiveness for long-term time series forecasting (LTSF) remains controversial. In this paper, we investigate the potential of integrating explicit periodicity modeling into the self-attention mechanism to enhance the performance of Transformer-based architectures for LTSF. Specifically, we propose PENGUIN, a simple yet effective periodic-nested group attention mechanism. Our approach introduces a periodic-aware relative attention bias to directly capture periodic structures and a grouped multi-query attention mechanism to handle multiple coexisting periodicities (e.g., daily and weekly cycles) within time series data. Extensive experiments across diverse benchmarks demonstrate that PENGUIN consistently outperforms both MLP-based and Transformer-based models. Code is available at https://github.com/ysygMhdxw/AISTATS2026_PENGUIN.

PENGUIN: Enhancing Transformer with Periodic-Nested Group Attention for Long-term Time Series Forecasting

Abstract

Despite advances in the Transformer architecture, their effectiveness for long-term time series forecasting (LTSF) remains controversial. In this paper, we investigate the potential of integrating explicit periodicity modeling into the self-attention mechanism to enhance the performance of Transformer-based architectures for LTSF. Specifically, we propose PENGUIN, a simple yet effective periodic-nested group attention mechanism. Our approach introduces a periodic-aware relative attention bias to directly capture periodic structures and a grouped multi-query attention mechanism to handle multiple coexisting periodicities (e.g., daily and weekly cycles) within time series data. Extensive experiments across diverse benchmarks demonstrate that PENGUIN consistently outperforms both MLP-based and Transformer-based models. Code is available at https://github.com/ysygMhdxw/AISTATS2026_PENGUIN.

Paper Structure

This paper contains 46 sections, 9 equations, 5 figures, 15 tables.

Figures (5)

  • Figure 1: Illustration of PENGUIN, a novel Transformer model with periodic-nested group attention tailored for LTSF. We first transform time series into patch embeddings in a channel-independent manner. Next, the encoder module comprises the proposed periodic-nested group attention and a feed-forward network. The right part of the figure illustrates the periodic-nested group attention with periodic/non-periodic attention bias.
  • Figure 2: Illustration of periodic attention modeling with an overlapping patching technique. Given a time series with a period of 12, the patching length $P$ and stride $S$ are set to 8 and 4, respectively. Every three patch embeddings share the same sub-region of the original time series, leading to a period after patching of $\mathcal{P}_{S}=[3]$.
  • Figure 3: The efficiency analysis of five models (i.e., PatchTST nie2022time, CATS kim2024self, TimesNet wu2022timesnet, FEDformer zhou2022fedformer, and PENGUIN) on ETTm1 dataset. The experiments are conducted with a look-back length $L$ of $336$. Both the number of parameters and MACs are plotted on a logarithmic scale.
  • Figure 4: Visualization of PENGUIN and PatchTST on the Traffic dataset, using $\mathcal{P} = \{24, 168\}$ with a patch length $P = 16$ and stride $S = 8$. For PatchTST, we train two variants with different masking strategies (i.e., full attention and attention with a causal mask). For PENGUIN, we visualize two attention heads, each belonging to a different attention group.
  • Figure 5: The experimental results of various models with diverse lookback windows $L$ on the ETTm1 and Traffic dataset. The experiments are averaged over the predicted horizon of $H \in \{96,192,336,720\}$.