Table of Contents
Fetching ...

Content-based Video Retrieval in Traffic Videos using Latent Dirichlet Allocation Topic Model

Mohammad Kianpisheh

TL;DR

This work tackles content-based video retrieval in traffic surveillance where unsupervised topic models yield ambiguous high-level topics. It introduces a multi-stage pipeline that learns separate LDA models per visual feature, applies clip blob and topic blob decompositions, and performs topic direction decomposition to obtain primitive, unambiguous topics. Retrieval is supported by multiple strategies, including sequence-based search via Smith-Waterman on primitive-topic sequences, plus single-topic, co-occurrence, and similar-clips queries, all via a lightweight, sparse database. GPU-accelerated feature extraction further speeds up processing, yielding substantial improvements in retrieval accuracy and efficiency across three real-world datasets. The approach enables flexible query formulation and scalable indexing suitable for large-scale surveillance deployments.

Abstract

Content-based video retrieval is one of the most challenging tasks in surveillance systems. In this study, Latent Dirichlet Allocation (LDA) topic model is used to annotate surveillance videos in an unsupervised manner. In scene understanding methods, some of the learned patterns are ambiguous and represents a mixture of atomic actions. To address the ambiguity issue in the proposed method, feature vectors, and the primary model are processed to obtain a secondary model which describes the scene with primitive patterns that lack any ambiguity. Experiments show performance improvement in the retrieval task compared to other topic model-based methods. In terms of false positive and true positive responses, the proposed method achieves at least 80\% and 124\% improvement respectively. Four search strategies are proposed, and users can define and search for a variety of activities using the proposed query formulation which is based on topic models. In addition, the lightweight database in our method occupies much fewer storage which in turn speeds up the search procedure compared to the methods which are based on low-level features.

Content-based Video Retrieval in Traffic Videos using Latent Dirichlet Allocation Topic Model

TL;DR

This work tackles content-based video retrieval in traffic surveillance where unsupervised topic models yield ambiguous high-level topics. It introduces a multi-stage pipeline that learns separate LDA models per visual feature, applies clip blob and topic blob decompositions, and performs topic direction decomposition to obtain primitive, unambiguous topics. Retrieval is supported by multiple strategies, including sequence-based search via Smith-Waterman on primitive-topic sequences, plus single-topic, co-occurrence, and similar-clips queries, all via a lightweight, sparse database. GPU-accelerated feature extraction further speeds up processing, yielding substantial improvements in retrieval accuracy and efficiency across three real-world datasets. The approach enables flexible query formulation and scalable indexing suitable for large-scale surveillance deployments.

Abstract

Content-based video retrieval is one of the most challenging tasks in surveillance systems. In this study, Latent Dirichlet Allocation (LDA) topic model is used to annotate surveillance videos in an unsupervised manner. In scene understanding methods, some of the learned patterns are ambiguous and represents a mixture of atomic actions. To address the ambiguity issue in the proposed method, feature vectors, and the primary model are processed to obtain a secondary model which describes the scene with primitive patterns that lack any ambiguity. Experiments show performance improvement in the retrieval task compared to other topic model-based methods. In terms of false positive and true positive responses, the proposed method achieves at least 80\% and 124\% improvement respectively. Four search strategies are proposed, and users can define and search for a variety of activities using the proposed query formulation which is based on topic models. In addition, the lightweight database in our method occupies much fewer storage which in turn speeds up the search procedure compared to the methods which are based on low-level features.

Paper Structure

This paper contains 17 sections, 2 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Low-level features to high-level patterns procedure. First, visual features are extracted from each document which is a bag of visual features. Then, documents are fed into the LDA model, to discover semantically meaningful patterns (topics) in the scene.
  • Figure 2: An ambiguous topic including multiple minor actions.
  • Figure 3: The proposed method framework. In the learning phase, blob decomposition is performed on extracted features, and new feature vectors are fed into the corresponding LDA topic model. Topics of the primary model are processed to form the secondary model with primitive topics. The user query formulation is parsed into the internal representation of the system, and the search procedure is performed using the proposed search strategies.
  • Figure 4: Learned persistence topics: (a) a topic obtained without performing clip blob decomposition step, (b) and (c) spatially separate persistence topics obtained in the presence of clip blob decomposition step.
  • Figure 5: Sample topics learned using LDA model for (first row) the compound feature vector, (second row) the motion visual feature, and (third row) the persistence visual feature.
  • ...and 11 more figures