A Temporal Kolmogorov-Arnold Transformer for Time Series Forecasting
Remi Genet, Hugo Inzirillo
TL;DR
This study tackles multivariate time series forecasting by introducing the Temporal Kolmogorov-Arnold Transformer (TKAT), a memory-augmented, transformer-based encoder-decoder that integrates Temporal Kolmogorov-Arnold Networks (TKANs) with attention mechanisms. It introduces Variable Selection Networks to focus learning on salient covariates and a Fully Aware Layer to preserve memory from attention outputs, enabling improved long-horizon forecasting. Through extensive benchmarks against TKAN, TFT-inspired variants, and standard recurrent models, TKAT demonstrates superior performance on a BTC notional forecasting task, particularly for longer horizons, while highlighting the importance of task-specific architecture design. The findings suggest that memory-augmented, attention-based architectures tailored to the distribution and input structure of financial time series can offer meaningful gains in predictive accuracy and interpretability, albeit with higher parameter counts and training considerations. Overall, the work establishes TKAN-based transformers as a promising direction for scalable, long-range multivariate time series forecasting in finance and related domains.
Abstract
Capturing complex temporal patterns and relationships within multivariate data streams is a difficult task. We propose the Temporal Kolmogorov-Arnold Transformer (TKAT), a novel attention-based architecture designed to address this task using Temporal Kolmogorov-Arnold Networks (TKANs). Inspired by the Temporal Fusion Transformer (TFT), TKAT emerges as a powerful encoder-decoder model tailored to handle tasks in which the observed part of the features is more important than the a priori known part. This new architecture combined the theoretical foundation of the Kolmogorov-Arnold representation with the power of transformers. TKAT aims to simplify the complex dependencies inherent in time series, making them more "interpretable". The use of transformer architecture in this framework allows us to capture long-range dependencies through self-attention mechanisms.
