Table of Contents
Fetching ...

AXIS: Explainable Time Series Anomaly Detection with Large Language Models

Tian Lan, Hao Duong Le, Jinbo Li, Wenjun He, Meng Wang, Chenghao Liu, Chen Zhang

TL;DR

AXIS addresses the need for semantic, pattern-level explanations in time-series anomaly detection by conditioning a frozen LLM with three complementary hints: a symbolic numeric grounding of the target window, a context-rich step-aligned representation from a pretrained encoder, and a global task-prior cue. This cross-modal alignment enables the LLM to produce high-quality explanations without modifying its architecture, while a dedicated semantic benchmark supports robust grounding and pattern-level reasoning. Empirical results show AXIS achieves state-of-the-art explanation quality and competitive anomaly detection across diverse datasets and LLM families, validated by both automated metrics and human judgments. The work advances explainable TSAD through principled prompt design, a two-phase training process, and reproducible benchmarks, paving the way for more faithful, domain-aligned AI explanations in time-series analytics.

Abstract

Time-series anomaly detection (TSAD) increasingly demands explanations that articulate not only if an anomaly occurred, but also what pattern it exhibits and why it is anomalous. Leveraging the impressive explanatory capabilities of Large Language Models (LLMs), recent works have attempted to treat time series as text for explainable TSAD. However, this approach faces a fundamental challenge: LLMs operate on discrete tokens and struggle to directly process long, continuous signals. Consequently, naive time-to-text serialization suffers from a lack of contextual grounding and representation alignment between the two modalities. To address this gap, we introduce AXIS, a framework that conditions a frozen LLM for nuanced time-series understanding. Instead of direct serialization, AXIS enriches the LLM's input with three complementary hints derived from the series: (i) a symbolic numeric hint for numerical grounding, (ii) a context-integrated, step-aligned hint distilled from a pretrained time-series encoder to capture fine-grained dynamics, and (iii) a task-prior hint that encodes global anomaly characteristics. Furthermore, to facilitate robust evaluation of explainability, we introduce a new benchmark featuring multi-format questions and rationales that supervise contextual grounding and pattern-level semantics. Extensive experiments, including both LLM-based and human evaluations, demonstrate that AXIS yields explanations of significantly higher quality and achieves competitive detection accuracy compared to general-purpose LLMs, specialized time-series LLMs, and time-series Vision Language Models.

AXIS: Explainable Time Series Anomaly Detection with Large Language Models

TL;DR

AXIS addresses the need for semantic, pattern-level explanations in time-series anomaly detection by conditioning a frozen LLM with three complementary hints: a symbolic numeric grounding of the target window, a context-rich step-aligned representation from a pretrained encoder, and a global task-prior cue. This cross-modal alignment enables the LLM to produce high-quality explanations without modifying its architecture, while a dedicated semantic benchmark supports robust grounding and pattern-level reasoning. Empirical results show AXIS achieves state-of-the-art explanation quality and competitive anomaly detection across diverse datasets and LLM families, validated by both automated metrics and human judgments. The work advances explainable TSAD through principled prompt design, a two-phase training process, and reproducible benchmarks, paving the way for more faithful, domain-aligned AI explanations in time-series analytics.

Abstract

Time-series anomaly detection (TSAD) increasingly demands explanations that articulate not only if an anomaly occurred, but also what pattern it exhibits and why it is anomalous. Leveraging the impressive explanatory capabilities of Large Language Models (LLMs), recent works have attempted to treat time series as text for explainable TSAD. However, this approach faces a fundamental challenge: LLMs operate on discrete tokens and struggle to directly process long, continuous signals. Consequently, naive time-to-text serialization suffers from a lack of contextual grounding and representation alignment between the two modalities. To address this gap, we introduce AXIS, a framework that conditions a frozen LLM for nuanced time-series understanding. Instead of direct serialization, AXIS enriches the LLM's input with three complementary hints derived from the series: (i) a symbolic numeric hint for numerical grounding, (ii) a context-integrated, step-aligned hint distilled from a pretrained time-series encoder to capture fine-grained dynamics, and (iii) a task-prior hint that encodes global anomaly characteristics. Furthermore, to facilitate robust evaluation of explainability, we introduce a new benchmark featuring multi-format questions and rationales that supervise contextual grounding and pattern-level semantics. Extensive experiments, including both LLM-based and human evaluations, demonstrate that AXIS yields explanations of significantly higher quality and achieves competitive detection accuracy compared to general-purpose LLMs, specialized time-series LLMs, and time-series Vision Language Models.

Paper Structure

This paper contains 79 sections, 15 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Deep learning method for TSAD: (a) Opaque anomaly scores fail to explain why; (b) XAI features lack intuitive semantics;
  • Figure 2: Bridging the Semantic Gap in Time Series Anomaly Explanation. (a) Current LLM-based methods fail due to: (i) poor Contextual Grounding, where observing a local pattern (e.g., the "V-shape") in isolation prevents a meaningful diagnosis; and (ii) Representation Misalignment, where inputs of abstract statistics (e.g., "variance increased") lead to uninformative, circular explanations. (b) Our approach overcomes these limitations by producing contextualized, pattern-level explanations that align with expert reasoning.
  • Figure 3: AXIS constructs the prompt by three representation pathways: (i) symbolic numeric grounding via window values, (ii) context-integrated local dynamics through step-aligned hints to capture contextual information, and (iii) task-prior hints encoding global priors.
  • Figure 4: The architecture of our procedural engine for generating context-aware and comparative anomaly explanation benchmarks.
  • Figure 5: Visualization of (a) contextual grounding and (b) representation alignment ability
  • ...and 8 more figures