Table of Contents
Fetching ...

Time-TK: A Multi-Offset Temporal Interaction Framework Combining Transformer and Kolmogorov-Arnold Networks for Time Series Forecasting

Fan Zhang, Shiming Fan, Hua Wang

TL;DR

Time-TK addresses the bottleneck of independent per-step embeddings in long-horizon forecasting by introducing a multi-offset paradigm that jointly leverages a Multi-Offset Token Embedding (MOTE), a Multi-Offset Interactive KAN (MI-KAN) with Gaussian RBFs, and a Multi-Offset Temporal Interaction (MOTI). The architecture fuses multiple offset sub-sequences with the original series through a global interaction mechanism, yielding a lightweight yet powerful predictor for $ extbf{Y}$ from historical data $\mathcal{X} \in \mathbb{R}^{N \times \mathcal{L}}$ to forecast $\hat{\mathcal{Y}} \in \mathbb{R}^{N \times \mathcal{F}}$. Across 14 real-world datasets, Time-TK achieves state-of-the-art results, ranking first in 23 of 26 settings and showing statistically significant improvements over strong baselines, while maintaining favorable memory and computational efficiency. The work also demonstrates that integrating MOTE into other architectures yields consistent gains, underlining the method's generality for scalable long-term time-series forecasting in web-scale environments.

Abstract

Time series forecasting is crucial for the World Wide Web and represents a core technical challenge in ensuring the stable and efficient operation of modern web services, such as intelligent transportation and website throughput. However, we have found that existing methods typically employ a strategy of embedding each time step as an independent token. This paradigm introduces a fundamental information bottleneck when processing long sequences, the root cause of which is that independent token embedding destroys a crucial structure within the sequence - what we term as multi-offset temporal correlation. This refers to the fine-grained dependencies embedded within the sequence that span across different time steps, which is especially prevalent in regular Web data. To fundamentally address this issue, we propose a new perspective on time series embedding. We provide an upper bound on the approximate reconstruction performance of token embedding, which guides our design of a concise yet effective Multi-Offset Time Embedding method to mitigate the performance degradation caused by standard token embedding. Furthermore, our MOTE can be integrated into various existing models and serve as a universal building block. Based on this paradigm, we further design a novel forecasting architecture named Time-TK. This architecture first utilizes a Multi-Offset Interactive KAN to learn and represent specific temporal patterns among multiple offset sub-sequences. Subsequently, it employs an efficient Multi-Offset Temporal Interaction mechanism to effectively capture the complex dependencies between these sub-sequences, achieving global information integration. Extensive experiments on 14 real-world benchmark datasets, covering domains such as traffic flow and BTC/USDT throughput, demonstrate that Time-TK significantly outperforms all baseline models, achieving state-of-the-art forecasting accuracy.

Time-TK: A Multi-Offset Temporal Interaction Framework Combining Transformer and Kolmogorov-Arnold Networks for Time Series Forecasting

TL;DR

Time-TK addresses the bottleneck of independent per-step embeddings in long-horizon forecasting by introducing a multi-offset paradigm that jointly leverages a Multi-Offset Token Embedding (MOTE), a Multi-Offset Interactive KAN (MI-KAN) with Gaussian RBFs, and a Multi-Offset Temporal Interaction (MOTI). The architecture fuses multiple offset sub-sequences with the original series through a global interaction mechanism, yielding a lightweight yet powerful predictor for from historical data to forecast . Across 14 real-world datasets, Time-TK achieves state-of-the-art results, ranking first in 23 of 26 settings and showing statistically significant improvements over strong baselines, while maintaining favorable memory and computational efficiency. The work also demonstrates that integrating MOTE into other architectures yields consistent gains, underlining the method's generality for scalable long-term time-series forecasting in web-scale environments.

Abstract

Time series forecasting is crucial for the World Wide Web and represents a core technical challenge in ensuring the stable and efficient operation of modern web services, such as intelligent transportation and website throughput. However, we have found that existing methods typically employ a strategy of embedding each time step as an independent token. This paradigm introduces a fundamental information bottleneck when processing long sequences, the root cause of which is that independent token embedding destroys a crucial structure within the sequence - what we term as multi-offset temporal correlation. This refers to the fine-grained dependencies embedded within the sequence that span across different time steps, which is especially prevalent in regular Web data. To fundamentally address this issue, we propose a new perspective on time series embedding. We provide an upper bound on the approximate reconstruction performance of token embedding, which guides our design of a concise yet effective Multi-Offset Time Embedding method to mitigate the performance degradation caused by standard token embedding. Furthermore, our MOTE can be integrated into various existing models and serve as a universal building block. Based on this paradigm, we further design a novel forecasting architecture named Time-TK. This architecture first utilizes a Multi-Offset Interactive KAN to learn and represent specific temporal patterns among multiple offset sub-sequences. Subsequently, it employs an efficient Multi-Offset Temporal Interaction mechanism to effectively capture the complex dependencies between these sub-sequences, achieving global information integration. Extensive experiments on 14 real-world benchmark datasets, covering domains such as traffic flow and BTC/USDT throughput, demonstrate that Time-TK significantly outperforms all baseline models, achieving state-of-the-art forecasting accuracy.
Paper Structure (24 sections, 8 equations, 11 figures, 9 tables)

This paper contains 24 sections, 8 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Illustration of four time series embedding strategies. (a) Mixed embedding of variables at the same time step. (b) Inverted embedding along the time axis. (c) Patch embedding based on temporal segmentation. (d) Multi-Offset embedding mechanism used in the proposed Time-TK.
  • Figure 2: (a) Average performance across all prediction windows, showing improvements over the baseline on various datasets. (b) Comparison of memory usage (GB), training time (ms/iter), and MSE on the Traffic dataset. The prediction length was set to 96.
  • Figure 3: Overall architecture of Time-TK. MOTE performs Multi-Offset token embedding on the sequence, followed by MI-KAN learning representation of the subsequences, and finally interactive prediction through MOTI.
  • Figure 4: Ablation study comparing Time-TK with its architectural variants on the Electricity and ETTm1 datasets across multiple prediction horizons.
  • Figure 5: t-SNE visualization after KAN and Transformer.
  • ...and 6 more figures