Table of Contents
Fetching ...

Are Time-Series Foundation Models Deployment-Ready? A Systematic Study of Adversarial Robustness Across Domains

Jiawen Zhang, Zhenwei Zhang, Shun Zheng, Xumeng Wen, Jia Li, Jiang Bian

TL;DR

<3-5 sentence high-level summary> The paper systematically evaluates the adversarial robustness of Time-Series Foundation Models (TSFMs) using a time-series–grounded framework that normalizes perturbation budgets and unifies evaluation across white-box and black-box settings. It reveals that current TSFMs are highly brittle, with vulnerabilities such as horizon-proximal brittleness and context-length amplification, and that attack transfer across models is limited. Targeted and untargeted attacks can steer forecasts toward attacker-defined trajectories even at small budgets, highlighting safety risks in deployment. The authors demonstrate that lightweight defenses like latent or input-space adversarial training substantially improve worst-case robustness and can transfer across domains, offering a viable path toward deployment-ready TSFMs. Overall, the work underscores that robustness should be treated as a prerequisite alongside accuracy for safe TSFM deployment in real-world decision making.

Abstract

Time-Series Foundation Models (TSFMs) are rapidly transitioning from research prototypes to core components of critical decision-making systems, driven by their impressive zero-shot forecasting capabilities. However, as their deployment surges, a critical blind spot remains: their fragility under adversarial attacks. This lack of scrutiny poses severe risks, particularly as TSFMs enter high-stakes environments vulnerable to manipulation. We present a systematic, diagnostic study arguing that for TSFMs, robustness is not merely a secondary metric but a prerequisite for trustworthy deployment comparable to accuracy. Our evaluation framework, explicitly tailored to the unique constraints of time series, incorporates normalized, sparsity-aware perturbation budgets and unified scale-invariant metrics across white-box and black-box settings. Across six representative TSFMs, we demonstrate that current architectures are alarmingly brittle: even small perturbations can reliably steer forecasts toward specific failure modes, such as trend flips and malicious drifts. We uncover TSFM-specific vulnerability patterns, including horizon-proximal brittleness, increased susceptibility with longer context windows, and weak cross-model transfer that points to model-specific failure modes rather than generic distortions. Finally, we show that simple adversarial fine-tuning offers a cost-effective path to substantial robustness gains, even with out-of-domain data. This work bridges the gap between TSFM capabilities and safety constraints, offering essential guidance for hardening the next generation of forecasting systems.

Are Time-Series Foundation Models Deployment-Ready? A Systematic Study of Adversarial Robustness Across Domains

TL;DR

<3-5 sentence high-level summary> The paper systematically evaluates the adversarial robustness of Time-Series Foundation Models (TSFMs) using a time-series–grounded framework that normalizes perturbation budgets and unifies evaluation across white-box and black-box settings. It reveals that current TSFMs are highly brittle, with vulnerabilities such as horizon-proximal brittleness and context-length amplification, and that attack transfer across models is limited. Targeted and untargeted attacks can steer forecasts toward attacker-defined trajectories even at small budgets, highlighting safety risks in deployment. The authors demonstrate that lightweight defenses like latent or input-space adversarial training substantially improve worst-case robustness and can transfer across domains, offering a viable path toward deployment-ready TSFMs. Overall, the work underscores that robustness should be treated as a prerequisite alongside accuracy for safe TSFM deployment in real-world decision making.

Abstract

Time-Series Foundation Models (TSFMs) are rapidly transitioning from research prototypes to core components of critical decision-making systems, driven by their impressive zero-shot forecasting capabilities. However, as their deployment surges, a critical blind spot remains: their fragility under adversarial attacks. This lack of scrutiny poses severe risks, particularly as TSFMs enter high-stakes environments vulnerable to manipulation. We present a systematic, diagnostic study arguing that for TSFMs, robustness is not merely a secondary metric but a prerequisite for trustworthy deployment comparable to accuracy. Our evaluation framework, explicitly tailored to the unique constraints of time series, incorporates normalized, sparsity-aware perturbation budgets and unified scale-invariant metrics across white-box and black-box settings. Across six representative TSFMs, we demonstrate that current architectures are alarmingly brittle: even small perturbations can reliably steer forecasts toward specific failure modes, such as trend flips and malicious drifts. We uncover TSFM-specific vulnerability patterns, including horizon-proximal brittleness, increased susceptibility with longer context windows, and weak cross-model transfer that points to model-specific failure modes rather than generic distortions. Finally, we show that simple adversarial fine-tuning offers a cost-effective path to substantial robustness gains, even with out-of-domain data. This work bridges the gap between TSFM capabilities and safety constraints, offering essential guidance for hardening the next generation of forecasting systems.

Paper Structure

This paper contains 83 sections, 21 equations, 9 figures, 16 tables, 2 algorithms.

Figures (9)

  • Figure 1: Visualization of targeted adversarial attacks on TSFMs across diverse domains. This figure illustrates how adversarial perturbations guide model forecasts toward specific target behaviors (i.e., transformed target). All perturbations use a variance-normalized budget, where $\epsilon$ is the per-step bound after normalization and $r$ is the fraction of perturbed time steps.The $q$ indicates prediction quantile (e.g., $q=0.5$ for median, $q=0.1/0.9$ for uncertainty bands).
  • Figure 2: Overview of the adversarial evaluation protocol for time-series foundation models. We evaluate TSFMs across diverse domains under a unified adversarial framework. Adversarial perturbations are applied to clean inputs, which are then passed through the TSFM to produce perturbed forecasts. We assess the impact of these attacks using both accuracy and robustness metrics.
  • Figure 3: Visualization of untargeted and targeted adversarial attacks on TSFMs. The $q$ indicates prediction quantile (e.g., $q=0.5$ for median, $q=0.1/0.9$ for uncertainty bands). The parameter $a$ controls the scaling of the target trajectory (with default $a = 1$), and $b$ controls the additive drift or offset (with default $b = 0$). Detailed RED$_\text{NMAE}$ scores are provided in Appendix \ref{['app:target_att']}.
  • Figure 4: Effects of perturbation location and context length on adversarial robustness. (a) Darker colors indicate higher frequencies at which a time point is identified as one of the most vulnerable. We compute the gradient magnitude wrt. the attack objective and select the top-25 positions as the most sensitive points. (b) NMAE under varying context lengths and attack ratios with PGD and SimBA attacks ($\epsilon = 0.5$). Higher values indicate greater performance degradation.
  • Figure 5: Effectiveness of untargeted adversarial attacks across different strategies. We evaluate PGD, SimBA (with wavelet, point, and DCT bases), and ZOO (with standard and Adam optimizers) under a fixed budget of $\epsilon = 0.5$, $r = 1$. Results are averaged over all datasets.
  • ...and 4 more figures