Understanding Token-level Topological Structures in Transformer-based Time Series Forecasting
Jianqi Zhang, Wenwen Qiang, Jingyao Wang, Jiahuan Zhou, Changwen Zheng, Hui Xiong
TL;DR
This work identifies that Transformer-based time series forecasting models progressively distort original token-level topology as layers deepen, limiting predictive accuracy. It introduces the Topology Enhancement Method (TEM), a plug-and-play framework with two modules: PTEM to preserve the original positional topology and STEM to preserve semantic topology, both guided by a bi-level optimization strategy. The authors provide theoretical generalization bounds showing that maintaining topology tightens the bound and corroborate these findings with extensive experiments across multiple datasets and TSF baselines, where TEM yields consistent performance gains. The approach offers a practical, adaptable pathway to enhance Transformer TSF models without altering core architectures, with code released for reproducibility and broader applicability.
Abstract
Transformer-based methods have achieved state-of-the-art performance in time series forecasting (TSF) by capturing positional and semantic topological relationships among input tokens. However, it remains unclear whether existing Transformers fully leverage the intrinsic topological structure among tokens throughout intermediate layers. Through empirical and theoretical analyses, we identify that current Transformer architectures progressively degrade the original positional and semantic topology of input tokens as the network deepens, thus limiting forecasting accuracy. Furthermore, our theoretical results demonstrate that explicitly enforcing preservation of these topological structures within intermediate layers can tighten generalization bounds, leading to improved forecasting performance. Motivated by these insights, we propose the Topology Enhancement Method (TEM), a novel Transformer-based TSF method that explicitly and adaptively preserves token-level topology. TEM consists of two core modules: 1) the Positional Topology Enhancement Module (PTEM), which injects learnable positional constraints to explicitly retain original positional topology; 2) the Semantic Topology Enhancement Module (STEM), which incorporates a learnable similarity matrix to preserve original semantic topology. To determine optimal injection weights adaptively, TEM employs a bi-level optimization strategy. The proposed TEM is a plug-and-play method that can be integrated with existing Transformer-based TSF methods. Extensive experiments demonstrate that integrating TEM with a variety of existing methods significantly improves their predictive performance, validating the effectiveness of explicitly preserving original token-level topology. Our code is publicly available at: \href{https://github.com/jlu-phyComputer/TEM}{https://github.com/jlu-phyComputer/TEM}.
