time2time: Causal Intervention in Hidden States to Simulate Rare Events in Time Series Foundation Models
Debdeep Sanyal, Aaryan Nagpal, Dhruv Kumar, Murari Mandal, Saurabh Deshpande
TL;DR
Time2time demonstrates that large time series transformers internalize semantically meaningful market regimes, not merely curve-fitting. The authors propose activation transplantation, a causal intervention that swaps activation statistics from a style regime into a target regime to deterministically steer forecasts, operationalized by $\tilde{A}_l(X_{target}) = \left( \frac{A_l(X_{target}) - \mu_l(X_{target})}{\sigma_l(X_{target})+\epsilon} \right) \odot \sigma_l(X_{style}) + \mu_l(X_{style})$ and resuming the forward pass. Across Toto and Chronos, and on both real NASDAQ-100 data and synthetic series, the method reveals a continuous severity axis for crashes, with latent norms correlating with crash magnitude and a low-dimensional regime-specific subspace emerging in deeper layers. These findings shift interpretability from post-hoc attribution to direct causal manipulation, enabling semantic what-if stress-testing in financial forecasting.
Abstract
While transformer-based foundation models excel at forecasting routine patterns, two questions remain: do they internalize semantic concepts such as market regimes, or merely fit curves? And can their internal representations be leveraged to simulate rare, high-stakes events such as market crashes? To investigate this, we introduce activation transplantation, a causal intervention that manipulates hidden states by imposing the statistical moments of one event (e.g., a historical crash) onto another (e.g., a calm period) during the forward pass. This procedure deterministically steers forecasts: injecting crash semantics induces downturn predictions, while injecting calm semantics suppresses crashes and restores stability. Beyond binary control, we find that models encode a graded notion of event severity, with the latent vector norm directly correlating with the magnitude of systemic shocks. Validated across two architecturally distinct TSFMs, Toto (decoder only) and Chronos (encoder-decoder), our results demonstrate that steerable, semantically grounded representations are a robust property of large time series transformers. Our findings provide evidence for a latent concept space that governs model predictions, shifting interpretability from post-hoc attribution to direct causal intervention, and enabling semantic "what-if" analysis for strategic stress-testing.
