Table of Contents
Fetching ...

Accurate and Efficient Two-Stage Gun Detection in Video

Badhan Chandra Das, M. Hadi Amini, Yanzhao Wu

TL;DR

This work tackles gun detection in videos, a challenging problem due to the tiny size of firearms and limited domain-specific labeled data. It first analyzes existing video classification methods and shows they underperform on gun detection, motivating a two-stage solution. The proposed approach uses image-augmented training of pre-trained image models to extract spatial gun features, followed by a sequence model to handle temporal context, and then applies an object detector (YOLOv11) only to videos flagged as Gun. Grad-CAM visualizations provide interpretability, and experimental results on synthetic firearm datasets and real-world CCTV data demonstrate significant gains in both detection accuracy and runtime efficiency compared with detection-only baselines.

Abstract

Object detection in videos plays a crucial role in advancing applications such as public safety and anomaly detection. Existing methods have explored different techniques, including CNN, deep learning, and Transformers, for object detection and video classification. However, detecting tiny objects, e.g., guns, in videos remains challenging due to their small scale and varying appearances in complex scenes. Moreover, existing video analysis models for classification or detection often perform poorly in real-world gun detection scenarios due to limited labeled video datasets for training. Thus, developing efficient methods for effectively capturing tiny object features and designing models capable of accurate gun detection in real-world videos is imperative. To address these challenges, we make three original contributions in this paper. First, we conduct an empirical study of several existing video classification and object detection methods to identify guns in videos. Our extensive analysis shows that these methods may not accurately detect guns in videos. Second, we propose a novel two-stage gun detection method. In stage 1, we train an image-augmented model to effectively classify ``Gun'' videos. To make the detection more precise and efficient, stage 2 employs an object detection model to locate the exact region of the gun within video frames for videos classified as ``Gun'' by stage 1. Third, our experimental results demonstrate that the proposed domain-specific method achieves significant performance improvements and enhances efficiency compared with existing techniques. We also discuss challenges and future research directions in gun detection tasks in computer vision.

Accurate and Efficient Two-Stage Gun Detection in Video

TL;DR

This work tackles gun detection in videos, a challenging problem due to the tiny size of firearms and limited domain-specific labeled data. It first analyzes existing video classification methods and shows they underperform on gun detection, motivating a two-stage solution. The proposed approach uses image-augmented training of pre-trained image models to extract spatial gun features, followed by a sequence model to handle temporal context, and then applies an object detector (YOLOv11) only to videos flagged as Gun. Grad-CAM visualizations provide interpretability, and experimental results on synthetic firearm datasets and real-world CCTV data demonstrate significant gains in both detection accuracy and runtime efficiency compared with detection-only baselines.

Abstract

Object detection in videos plays a crucial role in advancing applications such as public safety and anomaly detection. Existing methods have explored different techniques, including CNN, deep learning, and Transformers, for object detection and video classification. However, detecting tiny objects, e.g., guns, in videos remains challenging due to their small scale and varying appearances in complex scenes. Moreover, existing video analysis models for classification or detection often perform poorly in real-world gun detection scenarios due to limited labeled video datasets for training. Thus, developing efficient methods for effectively capturing tiny object features and designing models capable of accurate gun detection in real-world videos is imperative. To address these challenges, we make three original contributions in this paper. First, we conduct an empirical study of several existing video classification and object detection methods to identify guns in videos. Our extensive analysis shows that these methods may not accurately detect guns in videos. Second, we propose a novel two-stage gun detection method. In stage 1, we train an image-augmented model to effectively classify ``Gun'' videos. To make the detection more precise and efficient, stage 2 employs an object detection model to locate the exact region of the gun within video frames for videos classified as ``Gun'' by stage 1. Third, our experimental results demonstrate that the proposed domain-specific method achieves significant performance improvements and enhances efficiency compared with existing techniques. We also discuss challenges and future research directions in gun detection tasks in computer vision.

Paper Structure

This paper contains 18 sections, 7 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of the proposed classification-oriented gun detection method for videos.
  • Figure 2: The class activation heatmap for downsampled video clips and gun images generated by Grad-CAM selvaraju2017grad
  • Figure 3: ROC curves and AUC scores of our proposed method with different configurations for UCF crime dataset.
  • Figure 4: Visual examples of Correct, Missing, and False detection of classification-oriented gun detection method with the inference time and video length.