Table of Contents
Fetching ...

Adversarial Vulnerabilities in Large Language Models for Time Series Forecasting

Fuqiang Liu, Sicong Jiang, Luis Miranda-Moreno, Seongjin Choi, Lijun Sun

TL;DR

This paper investigates the adversarial vulnerabilities of large language models (LLMs) in time series forecasting. It introduces a gradient-free, black-box attack framework, Directional Gradient Approximation (DGA), to craft imperceptible perturbations that cause LLM-based forecasters to produce highly degraded predictions, outperforming Gaussian noise across five real-world datasets and multiple architectures. The authors formalize the forecasting formulation, define a strict black-box threat model, and demonstrate through extensive experiments that both TimeGPT-style and TimeLLM-based systems are susceptible, with non-LLM baselines offering relatively higher robustness. They also provide interpretation analyses showing distributional shifts toward random-walk-like forecasts and discuss mitigation avenues, highlighting preprocessing-based defenses as practical countermeasures given the high cost of adversarial training. Overall, the work underscores the need for robust defenses to ensure reliable deployment of LLMs for time-sensitive forecasting in high-stakes domains.

Abstract

Large Language Models (LLMs) have recently demonstrated significant potential in time series forecasting, offering impressive capabilities in handling complex temporal data. However, their robustness and reliability in real-world applications remain under-explored, particularly concerning their susceptibility to adversarial attacks. In this paper, we introduce a targeted adversarial attack framework for LLM-based time series forecasting. By employing both gradient-free and black-box optimization methods, we generate minimal yet highly effective perturbations that significantly degrade the forecasting accuracy across multiple datasets and LLM architectures. Our experiments, which include models like LLMTime with GPT-3.5, GPT-4, LLaMa, and Mistral, TimeGPT, and TimeLLM show that adversarial attacks lead to much more severe performance degradation than random noise, and demonstrate the broad effectiveness of our attacks across different LLMs. The results underscore the critical vulnerabilities of LLMs in time series forecasting, highlighting the need for robust defense mechanisms to ensure their reliable deployment in practical applications. The code repository can be found at https://github.com/JohnsonJiang1996/AdvAttack_LLM4TS.

Adversarial Vulnerabilities in Large Language Models for Time Series Forecasting

TL;DR

This paper investigates the adversarial vulnerabilities of large language models (LLMs) in time series forecasting. It introduces a gradient-free, black-box attack framework, Directional Gradient Approximation (DGA), to craft imperceptible perturbations that cause LLM-based forecasters to produce highly degraded predictions, outperforming Gaussian noise across five real-world datasets and multiple architectures. The authors formalize the forecasting formulation, define a strict black-box threat model, and demonstrate through extensive experiments that both TimeGPT-style and TimeLLM-based systems are susceptible, with non-LLM baselines offering relatively higher robustness. They also provide interpretation analyses showing distributional shifts toward random-walk-like forecasts and discuss mitigation avenues, highlighting preprocessing-based defenses as practical countermeasures given the high cost of adversarial training. Overall, the work underscores the need for robust defenses to ensure reliable deployment of LLMs for time-sensitive forecasting in high-stakes domains.

Abstract

Large Language Models (LLMs) have recently demonstrated significant potential in time series forecasting, offering impressive capabilities in handling complex temporal data. However, their robustness and reliability in real-world applications remain under-explored, particularly concerning their susceptibility to adversarial attacks. In this paper, we introduce a targeted adversarial attack framework for LLM-based time series forecasting. By employing both gradient-free and black-box optimization methods, we generate minimal yet highly effective perturbations that significantly degrade the forecasting accuracy across multiple datasets and LLM architectures. Our experiments, which include models like LLMTime with GPT-3.5, GPT-4, LLaMa, and Mistral, TimeGPT, and TimeLLM show that adversarial attacks lead to much more severe performance degradation than random noise, and demonstrate the broad effectiveness of our attacks across different LLMs. The results underscore the critical vulnerabilities of LLMs in time series forecasting, highlighting the need for robust defense mechanisms to ensure their reliable deployment in practical applications. The code repository can be found at https://github.com/JohnsonJiang1996/AdvAttack_LLM4TS.

Paper Structure

This paper contains 17 sections, 6 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Adversarial Black-box Attack for LLMs in Time Series Forecasting.
  • Figure 2: Robustness comparison between LLM-based forecasting models and lighter models. These figures highlight each model's relative robustness across various datasets. The blue and orange shaded areas represent the normalized increase in MAE for each model under the influence of DGA and GWN perturbations, respectively. A larger shaded area indicates greater vulnerability to perturbations.
  • Figure 3: (a) Inputs and predictions from LLMTime (using GPT-3.5) on the ETTh1 dataset; (b) Input bias and prediction errors corresponding to (a); (c) Inputs and predictions from LLMTime (using GPT-3.5) on the ETTh2 dataset; (d) Input bias and prediction errors corresponding to (c); (e) Inputs and predictions from TimeGPT on the weather dataset; (f) Input bias and prediction errors corresponding to (e). This figure highlights the greater disruption caused by DGA compared to GWN, showing significant deviations from the ground truth.
  • Figure 4: Prediction distribution comparison for LLMTime (using GPT-3.5, GPT-4) across different datasets under clean input, GWN, and DGA.
  • Figure 5: Autocorrelation function curve comparison on ETTh2 by LLMTime using GPT-3.5
  • ...and 1 more figures