TimeCapsule: Solving the Jigsaw Puzzle of Long-Term Time Series Forecasting with Compressed Predictive Representations
Yihang Lu, Yangyang Xu, Qitao Qing, Xianwei Meng
TL;DR
TimeCapsule tackles long-term forecasting for multivariate time series by treating data as a 3D tensor (temporal, variate, level) and applying high-dimensional information compression to unify diverse modeling principles. The approach combines mode-specific multi-head self-attention (MoMSA) with low-rank compression, residual information back, and JEPA-based internal prediction to learn compact, predictive representations that can map back to the temporal domain via a lightweight decoder. Extensive experiments across ten datasets show competitive or state-of-the-art performance, with notable improvements on very long horizons and insights into when multi-level modeling is most beneficial. The work demonstrates the practicality of a simple, compression-centric design that balances expressivity and efficiency, while highlighting limitations around fixed compression sizes and component usage for future generalization.
Abstract
Recent deep learning models for Long-term Time Series Forecasting (LTSF) often emphasize complex, handcrafted designs, while simpler architectures like linear models or MLPs have often outperformed these intricate solutions. In this paper, we revisit and organize the core ideas behind several key techniques, such as redundancy reduction and multi-scale modeling, which are frequently employed in advanced LTSF models. Our goal is to streamline these ideas for more efficient deep learning utilization. To this end, we introduce TimeCapsule, a model built around the principle of high-dimensional information compression that unifies these techniques in a generalized yet simplified framework. Specifically, we model time series as a 3D tensor, incorporating temporal, variate, and level dimensions, and leverage mode production to capture multi-mode dependencies while achieving dimensionality compression. We propose an internal forecast within the compressed representation domain, supported by the Joint-Embedding Predictive Architecture (JEPA), to monitor the learning of predictive representations. Extensive experiments on challenging benchmarks demonstrate the versatility of our method, showing that TimeCapsule can achieve state-of-the-art performance.
