Knowledge-Guided Textual Reasoning for Explainable Video Anomaly Detection via LLMs

Hari Lee

Knowledge-Guided Textual Reasoning for Explainable Video Anomaly Detection via LLMs

Hari Lee

TL;DR

TbVAD tackles video anomaly detection under weak supervision by shifting the entire reasoning pipeline to textual representations. It introduces a three-branch architecture: a Structured Knowledge Branch that creates multi-aspect textual priors from captions, a Text Understanding Branch that encodes fine-grained captions, and an Explainable Reasoning Branch that yields slot-wise importance, retrieved evidences, and natural-language explanations. Experiments on UCF-Crime and XD-Violence show competitive performance relative to vision-based baselines while offering enhanced interpretability through knowledge-grounded explanations. Ablation and cross-dataset analyses reveal the value of four semantic slots (context, action, object, environment) for robust, generalizable anomaly reasoning in real-world surveillance scenarios.

Abstract

We introduce Text-based Explainable Video Anomaly Detection (TbVAD), a language-driven framework for weakly supervised video anomaly detection that performs anomaly detection and explanation entirely within the textual domain. Unlike conventional WSVAD models that rely on explicit visual features, TbVAD represents video semantics through language, enabling interpretable and knowledge-grounded reasoning. The framework operates in three stages: (1) transforming video content into fine-grained captions using a vision-language model, (2) constructing structured knowledge by organizing the captions into four semantic slots (action, object, context, environment), and (3) generating slot-wise explanations that reveal which semantic factors contribute most to the anomaly decision. We evaluate TbVAD on two public benchmarks, UCF-Crime and XD-Violence, demonstrating that textual knowledge reasoning provides interpretable and reliable anomaly detection for real-world surveillance scenarios.

Knowledge-Guided Textual Reasoning for Explainable Video Anomaly Detection via LLMs

TL;DR

Abstract

Knowledge-Guided Textual Reasoning for Explainable Video Anomaly Detection via LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)