TwinFormer: A Dual-Level Transformer for Long-Sequence Time-Series Forecasting
Authors
Mahima Kumavat, Aditya Maheshwari
Abstract
TwinFormer is a hierarchical Transformer for long-sequence time-series forecasting. It divides the input into non-overlapping temporal patches and processes them in two stages: (1) a Local Informer with top- Sparse Attention models intra-patch dynamics, followed by mean pooling; (2) a Global Informer captures long-range inter-patch dependencies using the same top- attention. A lightweight GRU aggregates the globally contextualized patch tokens for direct multi-horizon prediction. The resulting architecture achieves linear time and memory complexity. On eight real-world benchmarking datasets from six different domains, including weather, stock price, temperature, power consumption, electricity, and disease, and forecasting horizons , TwinFormer secures positions in the top two out of . Out of the , it achieves the best performance on MAE and RMSE at places and at the second-best place on MAE and RMSE. This consistently outperforms PatchTST, iTransformer, FEDformer, Informer, and vanilla Transformers. Ablations confirm the superiority of top- Sparse Attention over ProbSparse and the effectiveness of GRU-based aggregation. Code is available at this repository: https://github.com/Mahimakumavat1205/TwinFormer.