Table of Contents
Fetching ...

Sentinel: Multi-Patch Transformer with Temporal and Channel Attention for Time Series Forecasting

Davide Villaboni, Alberto Castellini, Ivan Luciano Danesi, Alessandro Farinelli

TL;DR

Sentinel tackles multivariate time-series forecasting by using a fully transformer-based architecture that separately specializes in cross-channel context (encoder) and temporal causality (decoder). It introduces a novel multi-patch attention mechanism that leverages input patching to replace traditional multi-head attention, enabling efficient and effective modeling of both inter-channel and temporal dependencies. Empirical results across multiple benchmarks show Sentinel achieving state-of-the-art or competitive performance, with ablations confirming the importance of the channel-focused encoder and the patch-based attention. The approach provides a scalable, patch-driven framework with potential for further gains through few-shot learning and targeted refinements to high- and low-feature regimes.

Abstract

Transformer-based time series forecasting has recently gained strong interest due to the ability of transformers to model sequential data. Most of the state-of-the-art architectures exploit either temporal or inter-channel dependencies, limiting their effectiveness in multivariate time-series forecasting where both types of dependencies are crucial. We propose Sentinel, a full transformer-based architecture composed of an encoder able to extract contextual information from the channel dimension, and a decoder designed to capture causal relations and dependencies across the temporal dimension. Additionally, we introduce a multi-patch attention mechanism, which leverages the patching process to structure the input sequence in a way that can be naturally integrated into the transformer architecture, replacing the multi-head splitting process. Extensive experiments on standard benchmarks demonstrate that Sentinel, because of its ability to "monitor" both the temporal and the inter-channel dimension, achieves better or comparable performance with respect to state-of-the-art approaches.

Sentinel: Multi-Patch Transformer with Temporal and Channel Attention for Time Series Forecasting

TL;DR

Sentinel tackles multivariate time-series forecasting by using a fully transformer-based architecture that separately specializes in cross-channel context (encoder) and temporal causality (decoder). It introduces a novel multi-patch attention mechanism that leverages input patching to replace traditional multi-head attention, enabling efficient and effective modeling of both inter-channel and temporal dependencies. Empirical results across multiple benchmarks show Sentinel achieving state-of-the-art or competitive performance, with ablations confirming the importance of the channel-focused encoder and the patch-based attention. The approach provides a scalable, patch-driven framework with potential for further gains through few-shot learning and targeted refinements to high- and low-feature regimes.

Abstract

Transformer-based time series forecasting has recently gained strong interest due to the ability of transformers to model sequential data. Most of the state-of-the-art architectures exploit either temporal or inter-channel dependencies, limiting their effectiveness in multivariate time-series forecasting where both types of dependencies are crucial. We propose Sentinel, a full transformer-based architecture composed of an encoder able to extract contextual information from the channel dimension, and a decoder designed to capture causal relations and dependencies across the temporal dimension. Additionally, we introduce a multi-patch attention mechanism, which leverages the patching process to structure the input sequence in a way that can be naturally integrated into the transformer architecture, replacing the multi-head splitting process. Extensive experiments on standard benchmarks demonstrate that Sentinel, because of its ability to "monitor" both the temporal and the inter-channel dimension, achieves better or comparable performance with respect to state-of-the-art approaches.

Paper Structure

This paper contains 19 sections, 6 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Sentinel architecture
  • Figure 2: The figure illustrates the multi-patch attention mechanism. Initially, the time series is divided into multiple patches, where $N$ represents the number of patches and $P$ denotes the patch size. Each patch is generated with a stride $S$, which defines the distance between consecutive patches. On the right-hand side, the figure shows how this patching structure can be seamlessly integrated into a multi-head attention mechanism. By leveraging the patch-based representation, the patches serve as inputs to the attention layer, effectively exploiting the structure naturally induced by the patching operation.
  • Figure 3: Evolution of MAE and MSE Based on Lookback Window