Table of Contents
Fetching ...

time2time: Causal Intervention in Hidden States to Simulate Rare Events in Time Series Foundation Models

Debdeep Sanyal, Aaryan Nagpal, Dhruv Kumar, Murari Mandal, Saurabh Deshpande

TL;DR

Time2time demonstrates that large time series transformers internalize semantically meaningful market regimes, not merely curve-fitting. The authors propose activation transplantation, a causal intervention that swaps activation statistics from a style regime into a target regime to deterministically steer forecasts, operationalized by $\tilde{A}_l(X_{target}) = \left( \frac{A_l(X_{target}) - \mu_l(X_{target})}{\sigma_l(X_{target})+\epsilon} \right) \odot \sigma_l(X_{style}) + \mu_l(X_{style})$ and resuming the forward pass. Across Toto and Chronos, and on both real NASDAQ-100 data and synthetic series, the method reveals a continuous severity axis for crashes, with latent norms correlating with crash magnitude and a low-dimensional regime-specific subspace emerging in deeper layers. These findings shift interpretability from post-hoc attribution to direct causal manipulation, enabling semantic what-if stress-testing in financial forecasting.

Abstract

While transformer-based foundation models excel at forecasting routine patterns, two questions remain: do they internalize semantic concepts such as market regimes, or merely fit curves? And can their internal representations be leveraged to simulate rare, high-stakes events such as market crashes? To investigate this, we introduce activation transplantation, a causal intervention that manipulates hidden states by imposing the statistical moments of one event (e.g., a historical crash) onto another (e.g., a calm period) during the forward pass. This procedure deterministically steers forecasts: injecting crash semantics induces downturn predictions, while injecting calm semantics suppresses crashes and restores stability. Beyond binary control, we find that models encode a graded notion of event severity, with the latent vector norm directly correlating with the magnitude of systemic shocks. Validated across two architecturally distinct TSFMs, Toto (decoder only) and Chronos (encoder-decoder), our results demonstrate that steerable, semantically grounded representations are a robust property of large time series transformers. Our findings provide evidence for a latent concept space that governs model predictions, shifting interpretability from post-hoc attribution to direct causal intervention, and enabling semantic "what-if" analysis for strategic stress-testing.

time2time: Causal Intervention in Hidden States to Simulate Rare Events in Time Series Foundation Models

TL;DR

Time2time demonstrates that large time series transformers internalize semantically meaningful market regimes, not merely curve-fitting. The authors propose activation transplantation, a causal intervention that swaps activation statistics from a style regime into a target regime to deterministically steer forecasts, operationalized by and resuming the forward pass. Across Toto and Chronos, and on both real NASDAQ-100 data and synthetic series, the method reveals a continuous severity axis for crashes, with latent norms correlating with crash magnitude and a low-dimensional regime-specific subspace emerging in deeper layers. These findings shift interpretability from post-hoc attribution to direct causal manipulation, enabling semantic what-if stress-testing in financial forecasting.

Abstract

While transformer-based foundation models excel at forecasting routine patterns, two questions remain: do they internalize semantic concepts such as market regimes, or merely fit curves? And can their internal representations be leveraged to simulate rare, high-stakes events such as market crashes? To investigate this, we introduce activation transplantation, a causal intervention that manipulates hidden states by imposing the statistical moments of one event (e.g., a historical crash) onto another (e.g., a calm period) during the forward pass. This procedure deterministically steers forecasts: injecting crash semantics induces downturn predictions, while injecting calm semantics suppresses crashes and restores stability. Beyond binary control, we find that models encode a graded notion of event severity, with the latent vector norm directly correlating with the magnitude of systemic shocks. Validated across two architecturally distinct TSFMs, Toto (decoder only) and Chronos (encoder-decoder), our results demonstrate that steerable, semantically grounded representations are a robust property of large time series transformers. Our findings provide evidence for a latent concept space that governs model predictions, shifting interpretability from post-hoc attribution to direct causal intervention, and enabling semantic "what-if" analysis for strategic stress-testing.

Paper Structure

This paper contains 15 sections, 9 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Overview of the proposed time2time intervention. We extract the statistical moments (mean and standard deviation) of hidden activations at layer $\mathcal{l}$ from a style event, standardize the target activations, and re-standardize them with the style statistics. This activation transplantation implants the dynamics of one event (e.g., a market crash) into another (e.g., a calm period), steering the model’s forecast accordingly.
  • Figure 2: Forecast interventions via activation transplantation. We intervene on model forecasts at $l=8$ for both models by transferring statistical moments of hidden activations between regimes. X-axis: Time step in days Y-axis: NASDAQ 100 Index. Top rows: calm periods transplanted with crash statistics, which deterministically induce downturn forecasts simulating stress tests. Bottom rows: crash periods transplanted with calm statistics, which suppress downturns and restore stability. Shaded regions show 50% and 90% prediction intervals for the intervened forecasts, while green line indicates median forecasts by Toto and Chronos respectively. (Chronos Ablations in Section \ref{['sec:chronos-ablations']})
  • Figure 3: Cross-regime similarity in reduced latent subspace. (a) crash–calm pairs are strongly anti-correlated in early layers but gradually align. (b) crash–crash pairs rapidly converge into a coherent latent subspace by mid layers.
  • Figure 4: Cross-crash interventions reveal graded severity. Forecasts generated by transplanting crash signatures show that the forecast trajectory systematically deepens under severe signatures and is mitigated under milder ones, demonstrating that TSFMs encode crash events along a continuous latent severity axis.
  • Figure 5: Synthetic series generation. (a) Calm trajectory initialized at $X_0=2000$ and (b) crash trajectory initialized at $X_0=5000$, both generated using Eq. \ref{['eq: synthetic time series']} with parameters specified in Eq. \ref{['eq: parameter calm']} and Eq. \ref{['eq: parameter crash']}, respectively.
  • ...and 4 more figures