Flashback: Memory-Driven Zero-shot, Real-time Video Anomaly Detection

Hyogun Lee; Haksub Kim; Ig-Jae Kim; Yonghun Choi

Flashback: Memory-Driven Zero-shot, Real-time Video Anomaly Detection

Hyogun Lee, Haksub Kim, Ig-Jae Kim, Yonghun Choi

TL;DR

Flashback tackles the need for zero-shot, real-time video anomaly detection by offline memory construction using a frozen LLM and online retrieval via a cross-modal encoder. It introduces repulsive prompting and scaled anomaly penalization to reduce embedding bias and improve discrimination, producing per-segment anomaly scores and human-readable captions without online LLM calls. Across UCF-Crime and XD-Violence, it achieves state-of-the-art zero-shot performance and real-time throughput on consumer GPUs, outperforming baselines across AUC and AP metrics. The approach offers practical, explainable VAD suitable for large-scale surveillance while highlighting potential biases and areas for future work.

Abstract

Video Anomaly Detection (VAD) automatically identifies anomalous events from video, mitigating the need for human operators in large-scale surveillance deployments. However, two fundamental obstacles hinder real-world adoption: domain dependency and real-time constraints -- requiring near-instantaneous processing of incoming video. To this end, we propose Flashback, a zero-shot and real-time video anomaly detection paradigm. Inspired by the human cognitive mechanism of instantly judging anomalies and reasoning in current scenes based on past experience, Flashback operates in two stages: Recall and Respond. In the offline recall stage, an off-the-shelf LLM builds a pseudo-scene memory of both normal and anomalous captions without any reliance on real anomaly data. In the online respond stage, incoming video segments are embedded and matched against this memory via similarity search. By eliminating all LLM calls at inference time, Flashback delivers real-time VAD even on a consumer-grade GPU. On two large datasets from real-world surveillance scenarios, UCF-Crime and XD-Violence, we achieve 87.3 AUC (+7.0 pp) and 75.1 AP (+13.1 pp), respectively, outperforming prior zero-shot VAD methods by large margins.

Flashback: Memory-Driven Zero-shot, Real-time Video Anomaly Detection

TL;DR

Abstract

Flashback: Memory-Driven Zero-shot, Real-time Video Anomaly Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)