AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection
Junru Zhang, Lang Feng, Haoran Shi, Xu Guo, Han Yu, Yabo Dong, Duanqing Xu
TL;DR
AnomSeer tackles time-series anomaly detection with multimodal LLMs by grounding reasoning in precise time-series structure. It introduces expert chain-of-thought (ExpCoT) traces derived from classical TSAD priors and a Time-Series Grounded Policy Optimization (TimerPO) that uses optimal transport to align model reasoning with ExpCoT while orthogonally integrating this signal to avoid interfering with the primary objective. Through RL-based training on synthetic data and evaluation across diverse benchmarks, AnomSeer achieves state-of-the-art classification and localization while generating verifiable, fine-grained reasoning traces. The approach demonstrates strong generalization to unseen and real-world anomalies and offers a practical pathway for faithful, interpretive TSAD with compact backbones.
Abstract
Time-series anomaly detection (TSAD) with multimodal large language models (MLLMs) is an emerging area, yet a persistent challenge remains: MLLMs rely on coarse time-series heuristics but struggle with multi-dimensional, detailed reasoning, which is vital for understanding complex time-series data. We present AnomSeer to address this by reinforcing the model to ground its reasoning in precise, structural details of time series, unifying anomaly classification, localization, and explanation. At its core, an expert chain-of-thought trace is generated to provide a verifiable, fine-grained reasoning from classical analyses (e.g., statistical measures, frequency transforms). Building on this, we propose a novel time-series grounded policy optimization (TimerPO) that incorporates two additional components beyond standard reinforcement learning: a time-series grounded advantage based on optimal transport and an orthogonal projection to ensure this auxiliary granular signal does not interfere with the primary detection objective. Across diverse anomaly scenarios, AnomSeer, with Qwen2.5-VL-3B/7B-Instruct, outperforms larger commercial baselines (e.g., GPT-4o) in classification and localization accuracy, particularly on point- and frequency-driven exceptions. Moreover, it produces plausible time-series reasoning traces that support its conclusions.
