AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection

Junru Zhang; Lang Feng; Haoran Shi; Xu Guo; Han Yu; Yabo Dong; Duanqing Xu

AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection

Junru Zhang, Lang Feng, Haoran Shi, Xu Guo, Han Yu, Yabo Dong, Duanqing Xu

TL;DR

AnomSeer tackles time-series anomaly detection with multimodal LLMs by grounding reasoning in precise time-series structure. It introduces expert chain-of-thought (ExpCoT) traces derived from classical TSAD priors and a Time-Series Grounded Policy Optimization (TimerPO) that uses optimal transport to align model reasoning with ExpCoT while orthogonally integrating this signal to avoid interfering with the primary objective. Through RL-based training on synthetic data and evaluation across diverse benchmarks, AnomSeer achieves state-of-the-art classification and localization while generating verifiable, fine-grained reasoning traces. The approach demonstrates strong generalization to unseen and real-world anomalies and offers a practical pathway for faithful, interpretive TSAD with compact backbones.

Abstract

Time-series anomaly detection (TSAD) with multimodal large language models (MLLMs) is an emerging area, yet a persistent challenge remains: MLLMs rely on coarse time-series heuristics but struggle with multi-dimensional, detailed reasoning, which is vital for understanding complex time-series data. We present AnomSeer to address this by reinforcing the model to ground its reasoning in precise, structural details of time series, unifying anomaly classification, localization, and explanation. At its core, an expert chain-of-thought trace is generated to provide a verifiable, fine-grained reasoning from classical analyses (e.g., statistical measures, frequency transforms). Building on this, we propose a novel time-series grounded policy optimization (TimerPO) that incorporates two additional components beyond standard reinforcement learning: a time-series grounded advantage based on optimal transport and an orthogonal projection to ensure this auxiliary granular signal does not interfere with the primary detection objective. Across diverse anomaly scenarios, AnomSeer, with Qwen2.5-VL-3B/7B-Instruct, outperforms larger commercial baselines (e.g., GPT-4o) in classification and localization accuracy, particularly on point- and frequency-driven exceptions. Moreover, it produces plausible time-series reasoning traces that support its conclusions.

AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection

TL;DR

Abstract

Paper Structure (36 sections, 8 equations, 17 figures, 4 tables, 1 algorithm)

This paper contains 36 sections, 8 equations, 17 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Preliminary
Methodology
Expert Chain-of-Thought Generation
Time-Series Grounded Policy Optimization
Outcome-Aware Advantage.
Time-Series Reasoning Advantage.
Orthogonal Integration for Policy Optimization.
Experiments
Main Results
Ablation Study and Hyperparameter Analysis
Effect of TimerPO on Reasoning Pattern
Generalization Performance
Conclusions and Limitations
...and 21 more sections

Figures (17)

Figure 1: Comparison of model performance and time-series reasoning quality. Left: Affinity F1 (%) of different models on TSAD benchmarks. Middle: GPT-4o results, including word frequency distributions in reasoning (top) and its coarse-grained answer (bottom). Right: AnomSeer results, including word frequency distributions in reasoning (top) and its fine-grained answer (bottom).
Figure 2: The overall framework of AnomSeer. AnomSeer first generates ExpCoT reasoning traces $\mathbf{y}^*$ from the time-series data based on classical TSAD techniques (e.g., FFT). TimerPO then computes the outcome-aware advantage and leverages optimal transport to compute the time-series reasoning advantage, which is orthogonally integrated into policy optimization to ensure stable training and improved reasoning quality.
Figure 3: An example of TSAD reasoning produced by AnomSeer.
Figure 4: Hyperparameter sensitivity analysis on $\alpha$, comparing our method with the GPT-4o baseline (grey dashed line).
Figure 5: Comparison of distribution alignment between ExpCoT (blue) and AnomSeer (red) outputs, as well as token usage before and after applying TimerPO.
...and 12 more figures

AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection

TL;DR

Abstract

AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (17)