Interpretable Dynamic Graph Neural Networks for Small Occluded Object Detection and Tracking

Shahriar Soudeep; Md Abrar Jahin; M. F. Mridha

Interpretable Dynamic Graph Neural Networks for Small Occluded Object Detection and Tracking

Shahriar Soudeep, Md Abrar Jahin, M. F. Mridha

TL;DR

DGNN-YOLO tackles the challenging problem of detecting and tracking small occluded objects in urban traffic by integrating YOLOv11 with a Dynamic Graph Neural Network that updates spatial-temporal graphs in real time. The method combines robust detection, adaptive graph-based tracking, and explainable AI through Grad-CAM, Grad-CAM++, and Eigen-CAM to provide interpretable decisions. Empirical results on the i2 Object Detection Dataset show superior performance (e.g., Precision 0.8382, Recall 0.6875, mAP@0.5:0.95 0.6476) over baselines, supported by ablations and interpretability analyses. The work advances real-time intelligent transportation systems by delivering accurate, explainable small-object detection and tracking, while acknowledging limitations in extreme weather and rare classes, and proposing future enhancements like LiDAR fusion and edge deployment.

Abstract

The detection and tracking of small, occluded objects such as pedestrians, cyclists, and motorbikes pose significant challenges for traffic surveillance systems because of their erratic movement, frequent occlusion, and poor visibility in dynamic urban environments. Traditional methods like YOLO11, while proficient in spatial feature extraction for precise detection, often struggle with these small and dynamically moving objects, particularly in handling real-time data updates and resource efficiency. This paper introduces DGNN-YOLO, a novel framework that integrates dynamic graph neural networks (DGNNs) with YOLO11 to address these limitations. Unlike standard GNNs, DGNNs are chosen for their superior ability to dynamically update graph structures in real-time, which enables adaptive and robust tracking of objects in highly variable urban traffic scenarios. This framework constructs and regularly updates its graph representations, capturing objects as nodes and their interactions as edges, thus effectively responding to rapidly changing conditions. Additionally, DGNN-YOLO incorporates Grad-CAM, Grad-CAM++, and Eigen-CAM visualization techniques to enhance interpretability and foster trust, offering insights into the model's decision-making process. Extensive experiments validate the framework's performance, achieving a precision of 0.8382, recall of 0.6875, and mAP@0.5:0.95 of 0.6476, significantly outperforming existing methods. This study offers a scalable and interpretable solution for real-time traffic surveillance and significantly advances intelligent transportation systems' capabilities by addressing the critical challenge of detecting and tracking small, occluded objects.

Interpretable Dynamic Graph Neural Networks for Small Occluded Object Detection and Tracking

TL;DR

Abstract

Interpretable Dynamic Graph Neural Networks for Small Occluded Object Detection and Tracking

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)