Sentinel: Multi-Patch Transformer with Temporal and Channel Attention for Time Series Forecasting

Davide Villaboni; Alberto Castellini; Ivan Luciano Danesi; Alessandro Farinelli

Sentinel: Multi-Patch Transformer with Temporal and Channel Attention for Time Series Forecasting

Davide Villaboni, Alberto Castellini, Ivan Luciano Danesi, Alessandro Farinelli

TL;DR

Sentinel tackles multivariate time-series forecasting by using a fully transformer-based architecture that separately specializes in cross-channel context (encoder) and temporal causality (decoder). It introduces a novel multi-patch attention mechanism that leverages input patching to replace traditional multi-head attention, enabling efficient and effective modeling of both inter-channel and temporal dependencies. Empirical results across multiple benchmarks show Sentinel achieving state-of-the-art or competitive performance, with ablations confirming the importance of the channel-focused encoder and the patch-based attention. The approach provides a scalable, patch-driven framework with potential for further gains through few-shot learning and targeted refinements to high- and low-feature regimes.

Abstract

Transformer-based time series forecasting has recently gained strong interest due to the ability of transformers to model sequential data. Most of the state-of-the-art architectures exploit either temporal or inter-channel dependencies, limiting their effectiveness in multivariate time-series forecasting where both types of dependencies are crucial. We propose Sentinel, a full transformer-based architecture composed of an encoder able to extract contextual information from the channel dimension, and a decoder designed to capture causal relations and dependencies across the temporal dimension. Additionally, we introduce a multi-patch attention mechanism, which leverages the patching process to structure the input sequence in a way that can be naturally integrated into the transformer architecture, replacing the multi-head splitting process. Extensive experiments on standard benchmarks demonstrate that Sentinel, because of its ability to "monitor" both the temporal and the inter-channel dimension, achieves better or comparable performance with respect to state-of-the-art approaches.

Sentinel: Multi-Patch Transformer with Temporal and Channel Attention for Time Series Forecasting

TL;DR

Abstract

Sentinel: Multi-Patch Transformer with Temporal and Channel Attention for Time Series Forecasting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)