Depth as Prior Knowledge for Object Detection

Moussa Kassem Sbeyti; Nadja Klein

Depth as Prior Knowledge for Object Detection

Moussa Kassem Sbeyti, Nadja Klein

TL;DR

DepthPrior reframes depth as prior knowledge rather than a feature fusion input to object detectors, addressing the systematic degradation in small/distant object detection caused by depth-induced heteroscedasticity. The authors formalize the depth-detection relationship, then introduce three modular components—Depth-Based Loss Weighting (DLW), Depth-Based Loss Stratification (DLS), and Depth-Aware Confidence Thresholding (DCT)—that operate during training and inference without modifying detector architectures. Across four diverse benchmarks and two detector families, DepthPrior yields consistent gains, notably up to +9% mAP$_S$ and +7% mAR$_S$ for small objects, and enables depth-aware post-processing with minimal overhead via depth estimation. The approach demonstrates that depth-informed supervision, even from monocular depth estimates, can meaningfully improve distant object recall and precision, offering a practical, plug-and-play solution for safety-critical perception tasks.

Abstract

Detecting small and distant objects remains challenging for object detectors due to scale variation, low resolution, and background clutter. Safety-critical applications require reliable detection of these objects for safe planning. Depth information can improve detection, but existing approaches require complex, model-specific architectural modifications. We provide a theoretical analysis followed by an empirical investigation of the depth-detection relationship. Together, they explain how depth causes systematic performance degradation and why depth-informed supervision mitigates it. We introduce DepthPrior, a framework that uses depth as prior knowledge rather than as a fused feature, providing comparable benefits without modifying detector architectures. DepthPrior consists of Depth-Based Loss Weighting (DLW) and Depth-Based Loss Stratification (DLS) during training, and Depth-Aware Confidence Thresholding (DCT) during inference. The only overhead is the initial cost of depth estimation. Experiments across four benchmarks (KITTI, MS COCO, VisDrone, SUN RGB-D) and two detectors (YOLOv11, EfficientDet) demonstrate the effectiveness of DepthPrior, achieving up to +9% mAP$_S$ and +7% mAR$_S$ for small objects, with inference recovery rates as high as 95:1 (true vs. false detections). DepthPrior offers these benefits without additional sensors, architectural changes, or performance costs. Code is available at https://github.com/mos-ks/DepthPrior.

Depth as Prior Knowledge for Object Detection

TL;DR

and +7% mAR

for small objects, and enables depth-aware post-processing with minimal overhead via depth estimation. The approach demonstrates that depth-informed supervision, even from monocular depth estimates, can meaningfully improve distant object recall and precision, offering a practical, plug-and-play solution for safety-critical perception tasks.

Abstract

and +7% mAR

for small objects, with inference recovery rates as high as 95:1 (true vs. false detections). DepthPrior offers these benefits without additional sensors, architectural changes, or performance costs. Code is available at https://github.com/mos-ks/DepthPrior.

Paper Structure (65 sections, 4 theorems, 26 equations, 20 figures, 19 tables, 2 algorithms)

This paper contains 65 sections, 4 theorems, 26 equations, 20 figures, 19 tables, 2 algorithms.

Introduction
Related Work
Depth-Aware Object Detection
Multi-Task Learning for Object Detection
Prior Knowledge in Object Detection
Confidence Calibration and Threshold Optimization
Theoretical Considerations
Variance Model of Detection Loss
From Visual Information Degradation to Training Bias
Compensating for Training Bias
From Training to Inference
Methods
Depth-Based Loss Weighting (DLW)
Normalization and Inversion
Exponential Weighting
...and 50 more sections

Key Result

Proposition 3.1

Under Assumptions assumption:intensity_distance and assumption:variance_signal, the conditional variance of the detection loss is: where $\sigma_0^2 = \alpha^2/\kappa$.

Figures (20)

Figure 1: DepthPrior framework (notation simplified for clarity). Top (DLW): non-linear weighting $w_i = 1 + \alpha \cdot \exp(d_{i,\text{norm}})$ prioritizes distant objects. Middle (DLS): binary masks decompose loss into close/distant components with weights $\lambda_{\text{close}}, \lambda_{\text{distant}}$. Bottom (DCT): learned splines $\tau(d_{i,\text{norm}})$ adjust thresholds during inference. The framework requires only monocular depth estimation and operates during both training and inference without architectural modifications.
Figure 2: Depth distribution of all GT objects (blue) vs. MD (orange) for EfficientDet on validation data. Object counts shown in parentheses.
Figure 3: Depth-dependent error distributions (MD, red and ED, gray) at confidence thresholds $\tau_0 = 0.4$ (left) and $\tau_0 = 0.9$ (right) for EfficientDet on the validation set.
Figure 4: Match rate heatmaps for YOLOv11 on the validation set. Color indicates fraction of detections matching GT.
Figure 5: Static (blue) vs. DCT-recovered detections (orange) on inference data. Left: KITTI with EfficientDet. Right: VisDrone with YOLOv11.
...and 15 more figures

Theorems & Definitions (15)

Definition 3.1: Signal Quality Function
Proposition 3.1: Depth-Induced Heteroscedasticity
Corollary 3.2: Bias Toward Nearby Objects
Remark 3.1
Definition 3.2: Variance-Compensating Weights
Remark 3.2
Definition 3.3: Depth-Based Loss Stratification
Proposition 3.3: Gradient of $\mathcal{L}_{\text{strat}}$
Remark 3.3
Remark 3.4
...and 5 more

Depth as Prior Knowledge for Object Detection

TL;DR

Abstract

Depth as Prior Knowledge for Object Detection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (20)

Theorems & Definitions (15)