Table of Contents
Fetching ...

Interpretable Dynamic Graph Neural Networks for Small Occluded Object Detection and Tracking

Shahriar Soudeep, Md Abrar Jahin, M. F. Mridha

TL;DR

DGNN-YOLO tackles the challenging problem of detecting and tracking small occluded objects in urban traffic by integrating YOLOv11 with a Dynamic Graph Neural Network that updates spatial-temporal graphs in real time. The method combines robust detection, adaptive graph-based tracking, and explainable AI through Grad-CAM, Grad-CAM++, and Eigen-CAM to provide interpretable decisions. Empirical results on the i2 Object Detection Dataset show superior performance (e.g., Precision 0.8382, Recall 0.6875, mAP@0.5:0.95 0.6476) over baselines, supported by ablations and interpretability analyses. The work advances real-time intelligent transportation systems by delivering accurate, explainable small-object detection and tracking, while acknowledging limitations in extreme weather and rare classes, and proposing future enhancements like LiDAR fusion and edge deployment.

Abstract

The detection and tracking of small, occluded objects such as pedestrians, cyclists, and motorbikes pose significant challenges for traffic surveillance systems because of their erratic movement, frequent occlusion, and poor visibility in dynamic urban environments. Traditional methods like YOLO11, while proficient in spatial feature extraction for precise detection, often struggle with these small and dynamically moving objects, particularly in handling real-time data updates and resource efficiency. This paper introduces DGNN-YOLO, a novel framework that integrates dynamic graph neural networks (DGNNs) with YOLO11 to address these limitations. Unlike standard GNNs, DGNNs are chosen for their superior ability to dynamically update graph structures in real-time, which enables adaptive and robust tracking of objects in highly variable urban traffic scenarios. This framework constructs and regularly updates its graph representations, capturing objects as nodes and their interactions as edges, thus effectively responding to rapidly changing conditions. Additionally, DGNN-YOLO incorporates Grad-CAM, Grad-CAM++, and Eigen-CAM visualization techniques to enhance interpretability and foster trust, offering insights into the model's decision-making process. Extensive experiments validate the framework's performance, achieving a precision of 0.8382, recall of 0.6875, and mAP@0.5:0.95 of 0.6476, significantly outperforming existing methods. This study offers a scalable and interpretable solution for real-time traffic surveillance and significantly advances intelligent transportation systems' capabilities by addressing the critical challenge of detecting and tracking small, occluded objects.

Interpretable Dynamic Graph Neural Networks for Small Occluded Object Detection and Tracking

TL;DR

DGNN-YOLO tackles the challenging problem of detecting and tracking small occluded objects in urban traffic by integrating YOLOv11 with a Dynamic Graph Neural Network that updates spatial-temporal graphs in real time. The method combines robust detection, adaptive graph-based tracking, and explainable AI through Grad-CAM, Grad-CAM++, and Eigen-CAM to provide interpretable decisions. Empirical results on the i2 Object Detection Dataset show superior performance (e.g., Precision 0.8382, Recall 0.6875, mAP@0.5:0.95 0.6476) over baselines, supported by ablations and interpretability analyses. The work advances real-time intelligent transportation systems by delivering accurate, explainable small-object detection and tracking, while acknowledging limitations in extreme weather and rare classes, and proposing future enhancements like LiDAR fusion and edge deployment.

Abstract

The detection and tracking of small, occluded objects such as pedestrians, cyclists, and motorbikes pose significant challenges for traffic surveillance systems because of their erratic movement, frequent occlusion, and poor visibility in dynamic urban environments. Traditional methods like YOLO11, while proficient in spatial feature extraction for precise detection, often struggle with these small and dynamically moving objects, particularly in handling real-time data updates and resource efficiency. This paper introduces DGNN-YOLO, a novel framework that integrates dynamic graph neural networks (DGNNs) with YOLO11 to address these limitations. Unlike standard GNNs, DGNNs are chosen for their superior ability to dynamically update graph structures in real-time, which enables adaptive and robust tracking of objects in highly variable urban traffic scenarios. This framework constructs and regularly updates its graph representations, capturing objects as nodes and their interactions as edges, thus effectively responding to rapidly changing conditions. Additionally, DGNN-YOLO incorporates Grad-CAM, Grad-CAM++, and Eigen-CAM visualization techniques to enhance interpretability and foster trust, offering insights into the model's decision-making process. Extensive experiments validate the framework's performance, achieving a precision of 0.8382, recall of 0.6875, and mAP@0.5:0.95 of 0.6476, significantly outperforming existing methods. This study offers a scalable and interpretable solution for real-time traffic surveillance and significantly advances intelligent transportation systems' capabilities by addressing the critical challenge of detecting and tracking small, occluded objects.

Paper Structure

This paper contains 50 sections, 12 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: A snapshot of an urban traffic environment illustrating challenges in detecting and tracking small, occluded objects. Green bounding boxes highlight rickshaws and small vehicles; red boxes mark motorcycles; yellow boxes indicate pedestrians; pink boxes correspond to larger vehicles, such as trucks. Occlusion, congestion, and unpredictable movements of small objects complicate reliable detection and tracking, emphasizing the need for advanced surveillance systems to improve safety and traffic flow.
  • Figure 2: (a) Overview of the proposed DGNN-YOLO framework for small object detection and tracking in traffic videos, (b) YOLO11 architecture for small object detection, and (c) DGNN architecture for object tracking.
  • Figure 3: Class-wise distribution of the i2 Object Detection Dataset.
  • Figure 4: Performance metrics across epochs for the proposed DGNN-YOLO.
  • Figure 5: Validation results of DGNN-YOLO showing small object detection and tracking.
  • ...and 5 more figures