AXIS: Explainable Time Series Anomaly Detection with Large Language Models
Tian Lan, Hao Duong Le, Jinbo Li, Wenjun He, Meng Wang, Chenghao Liu, Chen Zhang
TL;DR
AXIS addresses the need for semantic, pattern-level explanations in time-series anomaly detection by conditioning a frozen LLM with three complementary hints: a symbolic numeric grounding of the target window, a context-rich step-aligned representation from a pretrained encoder, and a global task-prior cue. This cross-modal alignment enables the LLM to produce high-quality explanations without modifying its architecture, while a dedicated semantic benchmark supports robust grounding and pattern-level reasoning. Empirical results show AXIS achieves state-of-the-art explanation quality and competitive anomaly detection across diverse datasets and LLM families, validated by both automated metrics and human judgments. The work advances explainable TSAD through principled prompt design, a two-phase training process, and reproducible benchmarks, paving the way for more faithful, domain-aligned AI explanations in time-series analytics.
Abstract
Time-series anomaly detection (TSAD) increasingly demands explanations that articulate not only if an anomaly occurred, but also what pattern it exhibits and why it is anomalous. Leveraging the impressive explanatory capabilities of Large Language Models (LLMs), recent works have attempted to treat time series as text for explainable TSAD. However, this approach faces a fundamental challenge: LLMs operate on discrete tokens and struggle to directly process long, continuous signals. Consequently, naive time-to-text serialization suffers from a lack of contextual grounding and representation alignment between the two modalities. To address this gap, we introduce AXIS, a framework that conditions a frozen LLM for nuanced time-series understanding. Instead of direct serialization, AXIS enriches the LLM's input with three complementary hints derived from the series: (i) a symbolic numeric hint for numerical grounding, (ii) a context-integrated, step-aligned hint distilled from a pretrained time-series encoder to capture fine-grained dynamics, and (iii) a task-prior hint that encodes global anomaly characteristics. Furthermore, to facilitate robust evaluation of explainability, we introduce a new benchmark featuring multi-format questions and rationales that supervise contextual grounding and pattern-level semantics. Extensive experiments, including both LLM-based and human evaluations, demonstrate that AXIS yields explanations of significantly higher quality and achieves competitive detection accuracy compared to general-purpose LLMs, specialized time-series LLMs, and time-series Vision Language Models.
