Table of Contents
Fetching ...

HazyDet: Open-Source Benchmark for Drone-View Object Detection with Depth-Cues in Hazy Scenes

Changfeng Feng, Zhenyuan Chen, Xiang Li, Chunping Wang, Jian Yang, Ming-Ming Cheng, Yimian Dai, Qiang Fu

TL;DR

The paper tackles the challenge of drone-view object detection under hazy conditions by introducing HazyDet, a large-scale benchmark with 383k real-world and synthetically hazy instances, designed to reflect aerial perspectives and depth-related degradation. It proposes DeCoDet, a Depth-Conditioned Detector that leverages Depth-Conditioned Kernels and multi-scale depth priors to modulate feature representations based on depth cues, avoiding explicit dehazing. Training employs Progressive Domain Fine-Tuning (PDFT) to bridge synthetic-to-real domain gaps and a Scale-Invariant Refurbishment Loss (SIRLoss) to robustly learn from noisy depth annotations, yielding state-of-the-art performance with a +1.5% mAP improvement on challenging real hazy data. The dataset and toolkit are open-sourced, enabling principled evaluation and reproducible development for robust UAV perception in adverse weather. The approach emphasizes depth as a core modality for haze resilience, linking atmospheric scattering to object-scale and visibility in aerial imagery through the ASM formulation $I(x,y)=J(x,y)t(x,y)+A(1-t(x,y))$, $t(x,y)=e^{-eta d(x,y)}$.

Abstract

Object detection from aerial platforms under adverse atmospheric conditions, particularly haze, is paramount for robust drone autonomy. Yet, this domain remains largely underexplored, primarily hindered by the absence of specialized benchmarks. To bridge this gap, we present \textit{HazyDet}, the first, large-scale benchmark specifically designed for drone-view object detection in hazy conditions. Comprising 383,000 real-world instances derived from both naturally hazy captures and synthetically hazed scenes augmented from clear images, HazyDet provides a challenging and realistic testbed for advancing detection algorithms. To address the severe visual degradation induced by haze, we propose the Depth-Conditioned Detector (DeCoDet), a novel architecture that integrates a Depth-Conditioned Kernel to dynamically modulate feature representations based on depth cues. The practical efficacy and robustness of DeCoDet are further enhanced by its training with a Progressive Domain Fine-Tuning (PDFT) strategy to navigate synthetic-to-real domain shifts, and a Scale-Invariant Refurbishment Loss (SIRLoss) to ensure resilient learning from potentially noisy depth annotations. Comprehensive empirical validation on HazyDet substantiates the superiority of our unified DeCoDet framework, which achieves state-of-the-art performance, surpassing the closest competitor by a notable +1.5\% mAP on challenging real-world hazy test scenarios. Our dataset and toolkit are available at https://github.com/GrokCV/HazyDet.

HazyDet: Open-Source Benchmark for Drone-View Object Detection with Depth-Cues in Hazy Scenes

TL;DR

The paper tackles the challenge of drone-view object detection under hazy conditions by introducing HazyDet, a large-scale benchmark with 383k real-world and synthetically hazy instances, designed to reflect aerial perspectives and depth-related degradation. It proposes DeCoDet, a Depth-Conditioned Detector that leverages Depth-Conditioned Kernels and multi-scale depth priors to modulate feature representations based on depth cues, avoiding explicit dehazing. Training employs Progressive Domain Fine-Tuning (PDFT) to bridge synthetic-to-real domain gaps and a Scale-Invariant Refurbishment Loss (SIRLoss) to robustly learn from noisy depth annotations, yielding state-of-the-art performance with a +1.5% mAP improvement on challenging real hazy data. The dataset and toolkit are open-sourced, enabling principled evaluation and reproducible development for robust UAV perception in adverse weather. The approach emphasizes depth as a core modality for haze resilience, linking atmospheric scattering to object-scale and visibility in aerial imagery through the ASM formulation , .

Abstract

Object detection from aerial platforms under adverse atmospheric conditions, particularly haze, is paramount for robust drone autonomy. Yet, this domain remains largely underexplored, primarily hindered by the absence of specialized benchmarks. To bridge this gap, we present \textit{HazyDet}, the first, large-scale benchmark specifically designed for drone-view object detection in hazy conditions. Comprising 383,000 real-world instances derived from both naturally hazy captures and synthetically hazed scenes augmented from clear images, HazyDet provides a challenging and realistic testbed for advancing detection algorithms. To address the severe visual degradation induced by haze, we propose the Depth-Conditioned Detector (DeCoDet), a novel architecture that integrates a Depth-Conditioned Kernel to dynamically modulate feature representations based on depth cues. The practical efficacy and robustness of DeCoDet are further enhanced by its training with a Progressive Domain Fine-Tuning (PDFT) strategy to navigate synthetic-to-real domain shifts, and a Scale-Invariant Refurbishment Loss (SIRLoss) to ensure resilient learning from potentially noisy depth annotations. Comprehensive empirical validation on HazyDet substantiates the superiority of our unified DeCoDet framework, which achieves state-of-the-art performance, surpassing the closest competitor by a notable +1.5\% mAP on challenging real-world hazy test scenarios. Our dataset and toolkit are available at https://github.com/GrokCV/HazyDet.
Paper Structure (25 sections, 5 equations, 12 figures, 6 tables)

This paper contains 25 sections, 5 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Challenges faced by drone-view object detection under hazy conditions.
  • Figure 2: Data Landscape and Domain Fine-Tuning Strategy. (a) HazyDet uniquely addresses the critical gap in drone-view adverse-weather scenarios. (b) PDFT paradigm bridges the simulated-to-real domain gap.
  • Figure 3: Construction pipeline and representative samples of the HazyDet dataset. (a) Simulated data is generated using physics-based simulation based on the Atmospheric Scattering Model (ASM), while semi-automatic annotation techniques are employed to improve the accuracy and efficiency of annotations for real-world data. (b) Clean images; (c) Synthetically generated hazy images; (d) Real-world hazy images captured in HazyDet.
  • Figure 4: The framework of DeCoDet. The network comprises a backbone and a feature pyramid network for feature extraction. Each detection head incorporates depth-specific convolutional modules to infer depth maps at multiple scales, with the scale-invariant restoration loss (SIRLoss) computed with respect to pseudo-depth maps to facilitate the generation of depth priors. The intermediate depth features incorporating these depth priors are subsequently utilized by the Depth Conditioned Kernel (DCK) module to dynamically generate filter kernels, thereby adaptively modulating the detection features.
  • Figure 5: Network visualization and robustness analysis. (a) Feature visualization comparing baseline and DeCoDet models under foggy conditions, displaying ground truth (left) alongside Grad-CAM ICCV2017GradCAM activation maps across backbone layers C2-C5. (b) Depth estimation visualization across network layers, showing original ground truth (left) and corresponding depth maps at different level(right). The upper row presents learned depth maps while the lower row shows network predictions. Color bar indicates depth in meters. (c) Network robustness verification. The left columns show the original ground truth, while the right columns present the depth maps used for DeCoDet training. The noise level denotes the variance of the added Gaussian noise, with 0 indicating no noise. Values at the bottom reflect the network’s performance with noise-aware labels on both simulated and real-world data.
  • ...and 7 more figures