Frozen in Time: Parameter-Efficient Time Series Transformers via Reservoir-Induced Feature Expansion and Fixed Random Dynamics
Pradeep Singh, Mehak Sharma, Anupriya Dey, Balasubramanian Raman
TL;DR
FreezeTST addresses long-horizon forecasting by inserting a frozen reservoir between self-attention blocks, enabling long memory with fixed parameters while training only a small, adaptive head. The approach halves trainable parameters, reduces training time, and extends effective memory without changing inference cost, backed by a 1-Lipschitz guarantee and a closed-form memory horizon bound linking leak and spectral radius. Empirical evaluation on seven LSTF datasets shows competitive to state-of-the-art Transformer baselines, with strong efficiency benefits and robust ablations supporting the freezing strategy. This work provides a principled, scalable path for resource-efficient, high‑quality long-range time-series forecasting in real-world settings.
Abstract
Transformers are the de-facto choice for sequence modelling, yet their quadratic self-attention and weak temporal bias can make long-range forecasting both expensive and brittle. We introduce FreezeTST, a lightweight hybrid that interleaves frozen random-feature (reservoir) blocks with standard trainable Transformer layers. The frozen blocks endow the network with rich nonlinear memory at no optimisation cost; the trainable layers learn to query this memory through self-attention. The design cuts trainable parameters and also lowers wall-clock training time, while leaving inference complexity unchanged. On seven standard long-term forecasting benchmarks, FreezeTST consistently matches or surpasses specialised variants such as Informer, Autoformer, and PatchTST; with substantially lower compute. Our results show that embedding reservoir principles within Transformers offers a simple, principled route to efficient long-term time-series prediction.
