Table of Contents
Fetching ...

Frozen in Time: Parameter-Efficient Time Series Transformers via Reservoir-Induced Feature Expansion and Fixed Random Dynamics

Pradeep Singh, Mehak Sharma, Anupriya Dey, Balasubramanian Raman

TL;DR

FreezeTST addresses long-horizon forecasting by inserting a frozen reservoir between self-attention blocks, enabling long memory with fixed parameters while training only a small, adaptive head. The approach halves trainable parameters, reduces training time, and extends effective memory without changing inference cost, backed by a 1-Lipschitz guarantee and a closed-form memory horizon bound linking leak and spectral radius. Empirical evaluation on seven LSTF datasets shows competitive to state-of-the-art Transformer baselines, with strong efficiency benefits and robust ablations supporting the freezing strategy. This work provides a principled, scalable path for resource-efficient, high‑quality long-range time-series forecasting in real-world settings.

Abstract

Transformers are the de-facto choice for sequence modelling, yet their quadratic self-attention and weak temporal bias can make long-range forecasting both expensive and brittle. We introduce FreezeTST, a lightweight hybrid that interleaves frozen random-feature (reservoir) blocks with standard trainable Transformer layers. The frozen blocks endow the network with rich nonlinear memory at no optimisation cost; the trainable layers learn to query this memory through self-attention. The design cuts trainable parameters and also lowers wall-clock training time, while leaving inference complexity unchanged. On seven standard long-term forecasting benchmarks, FreezeTST consistently matches or surpasses specialised variants such as Informer, Autoformer, and PatchTST; with substantially lower compute. Our results show that embedding reservoir principles within Transformers offers a simple, principled route to efficient long-term time-series prediction.

Frozen in Time: Parameter-Efficient Time Series Transformers via Reservoir-Induced Feature Expansion and Fixed Random Dynamics

TL;DR

FreezeTST addresses long-horizon forecasting by inserting a frozen reservoir between self-attention blocks, enabling long memory with fixed parameters while training only a small, adaptive head. The approach halves trainable parameters, reduces training time, and extends effective memory without changing inference cost, backed by a 1-Lipschitz guarantee and a closed-form memory horizon bound linking leak and spectral radius. Empirical evaluation on seven LSTF datasets shows competitive to state-of-the-art Transformer baselines, with strong efficiency benefits and robust ablations supporting the freezing strategy. This work provides a principled, scalable path for resource-efficient, high‑quality long-range time-series forecasting in real-world settings.

Abstract

Transformers are the de-facto choice for sequence modelling, yet their quadratic self-attention and weak temporal bias can make long-range forecasting both expensive and brittle. We introduce FreezeTST, a lightweight hybrid that interleaves frozen random-feature (reservoir) blocks with standard trainable Transformer layers. The frozen blocks endow the network with rich nonlinear memory at no optimisation cost; the trainable layers learn to query this memory through self-attention. The design cuts trainable parameters and also lowers wall-clock training time, while leaving inference complexity unchanged. On seven standard long-term forecasting benchmarks, FreezeTST consistently matches or surpasses specialised variants such as Informer, Autoformer, and PatchTST; with substantially lower compute. Our results show that embedding reservoir principles within Transformers offers a simple, principled route to efficient long-term time-series prediction.

Paper Structure

This paper contains 33 sections, 3 theorems, 30 equations, 4 figures, 7 tables.

Key Result

Proposition 1

Let the reservoir state $\mathbf{h}_t\in\mathbb{R}^{N_h}$ evolve via eq:reservoir-update, and $\phi:\mathbb{R}\to\mathbb{R}$ is $L_\phi$‑Lipschitz ($L_\phi\le1$), and the recurrent weight matrix satisfies $\|W_r\|_2\le\alpha<1$, so the linear part is contractive. Set Let two input sequences $\{\mathbf x^{(1)}_s\}_{s\le t}$ and $\{\mathbf x^{(2)}_s\}_{s\le t}$ be identical except at time $t-\tau$,

Figures (4)

  • Figure 1: Architecture of Freeze Time Series Transformer (FreezeTST)
  • Figure 2: MSE as a function of the number of encoder layers. Panels: Top‑left: ETTh1 ($H = 96$); top‑right: ETTh1 ($H = 720$); bottom‑left: ETTm2 ($H = 96$); bottom‑right: ETTm2 ($H = 720$).
  • Figure 3: MSE on ETTh1 as the number of encoder layers varies from 3 to 10 (horizon $H = 96$, look‑back window $T = 336$).
  • Figure 4: MSE as a function of the number of encoder layers across different prediction lengths and freezing schemes. Panels: Top‑left: ETTh1 ($H = 192$); top‑right: ETTh1 ($H = 336$); bottom‑left: ETTm2 ($H = 192$); bottom‑right: ETTm2 ($H = 336$).

Theorems & Definitions (6)

  • Proposition 1: Exponential forgetting and receptive-field length
  • proof
  • Proposition 2: Reservoir Stability Under $\rho(W_{\mathrm{res}}) < 1$
  • proof
  • Proposition 3: Non-expansiveness and gradient bound
  • proof