TRACES: Temporal Recall with Contextual Embeddings for Real-Time Video Anomaly Detection

Yousuf Ahmed Siddiqui; Sufiyaan Usmani; Umer Tariq; Jawwad Ahmed Shamsi; Muhammad Burhan Khan

TRACES: Temporal Recall with Contextual Embeddings for Real-Time Video Anomaly Detection

Yousuf Ahmed Siddiqui, Sufiyaan Usmani, Umer Tariq, Jawwad Ahmed Shamsi, Muhammad Burhan Khan

TL;DR

TRACE addresses the challenge of context-aware zero-shot video anomaly detection by integrating motion and appearance through temporal cross-attention and a large Traces Bank of contextual embeddings. The method preserves frozen large encoders, adds lightweight adapters and cross-modal fusion, and uses retrieval over text-derived context traces to score anomalies without labeled anomaly data. It achieves state-of-the-art zero-shot performance on UCF-Crime and XD-Violence with real-time inference and interpretable cross-attention explanations, demonstrating strong generalization to unseen events. The work advances practical surveillance deployment by combining context recall with open-set anomaly reasoning and providing robust, low-latency detection in real-world settings.

Abstract

Video anomalies often depend on contextual information available and temporal evolution. Non-anomalous action in one context can be anomalous in some other context. Most anomaly detectors, however, do not notice this type of context, which seriously limits their capability to generalize to new, real-life situations. Our work addresses the context-aware zero-shot anomaly detection challenge, in which systems need to learn adaptively to detect new events by correlating temporal and appearance features with textual traces of memory in real time. Our approach defines a memory-augmented pipeline, correlating temporal signals with visual embeddings using cross-attention, and real-time zero-shot anomaly classification by contextual similarity scoring. We achieve 90.4\% AUC on UCF-Crime and 83.67\% AP on XD-Violence, a new state-of-the-art among zero-shot models. Our model achieves real-time inference with high precision and explainability for deployment. We show that, by fusing cross-attention temporal fusion and contextual memory, we achieve high fidelity anomaly detection, a step towards the applicability of zero-shot models in real-world surveillance and infrastructure monitoring.

TRACES: Temporal Recall with Contextual Embeddings for Real-Time Video Anomaly Detection

TL;DR

Abstract

TRACES: Temporal Recall with Contextual Embeddings for Real-Time Video Anomaly Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)