Content-based Video Retrieval in Traffic Videos using Latent Dirichlet Allocation Topic Model
Mohammad Kianpisheh
TL;DR
This work tackles content-based video retrieval in traffic surveillance where unsupervised topic models yield ambiguous high-level topics. It introduces a multi-stage pipeline that learns separate LDA models per visual feature, applies clip blob and topic blob decompositions, and performs topic direction decomposition to obtain primitive, unambiguous topics. Retrieval is supported by multiple strategies, including sequence-based search via Smith-Waterman on primitive-topic sequences, plus single-topic, co-occurrence, and similar-clips queries, all via a lightweight, sparse database. GPU-accelerated feature extraction further speeds up processing, yielding substantial improvements in retrieval accuracy and efficiency across three real-world datasets. The approach enables flexible query formulation and scalable indexing suitable for large-scale surveillance deployments.
Abstract
Content-based video retrieval is one of the most challenging tasks in surveillance systems. In this study, Latent Dirichlet Allocation (LDA) topic model is used to annotate surveillance videos in an unsupervised manner. In scene understanding methods, some of the learned patterns are ambiguous and represents a mixture of atomic actions. To address the ambiguity issue in the proposed method, feature vectors, and the primary model are processed to obtain a secondary model which describes the scene with primitive patterns that lack any ambiguity. Experiments show performance improvement in the retrieval task compared to other topic model-based methods. In terms of false positive and true positive responses, the proposed method achieves at least 80\% and 124\% improvement respectively. Four search strategies are proposed, and users can define and search for a variety of activities using the proposed query formulation which is based on topic models. In addition, the lightweight database in our method occupies much fewer storage which in turn speeds up the search procedure compared to the methods which are based on low-level features.
