Depth as Prior Knowledge for Object Detection
Moussa Kassem Sbeyti, Nadja Klein
TL;DR
DepthPrior reframes depth as prior knowledge rather than a feature fusion input to object detectors, addressing the systematic degradation in small/distant object detection caused by depth-induced heteroscedasticity. The authors formalize the depth-detection relationship, then introduce three modular components—Depth-Based Loss Weighting (DLW), Depth-Based Loss Stratification (DLS), and Depth-Aware Confidence Thresholding (DCT)—that operate during training and inference without modifying detector architectures. Across four diverse benchmarks and two detector families, DepthPrior yields consistent gains, notably up to +9% mAP$_S$ and +7% mAR$_S$ for small objects, and enables depth-aware post-processing with minimal overhead via depth estimation. The approach demonstrates that depth-informed supervision, even from monocular depth estimates, can meaningfully improve distant object recall and precision, offering a practical, plug-and-play solution for safety-critical perception tasks.
Abstract
Detecting small and distant objects remains challenging for object detectors due to scale variation, low resolution, and background clutter. Safety-critical applications require reliable detection of these objects for safe planning. Depth information can improve detection, but existing approaches require complex, model-specific architectural modifications. We provide a theoretical analysis followed by an empirical investigation of the depth-detection relationship. Together, they explain how depth causes systematic performance degradation and why depth-informed supervision mitigates it. We introduce DepthPrior, a framework that uses depth as prior knowledge rather than as a fused feature, providing comparable benefits without modifying detector architectures. DepthPrior consists of Depth-Based Loss Weighting (DLW) and Depth-Based Loss Stratification (DLS) during training, and Depth-Aware Confidence Thresholding (DCT) during inference. The only overhead is the initial cost of depth estimation. Experiments across four benchmarks (KITTI, MS COCO, VisDrone, SUN RGB-D) and two detectors (YOLOv11, EfficientDet) demonstrate the effectiveness of DepthPrior, achieving up to +9% mAP$_S$ and +7% mAR$_S$ for small objects, with inference recovery rates as high as 95:1 (true vs. false detections). DepthPrior offers these benefits without additional sensors, architectural changes, or performance costs. Code is available at https://github.com/mos-ks/DepthPrior.
