Depth-Centric Dehazing and Depth-Estimation from Real-World Hazy Driving Video
Junkai Fan, Kun Wang, Zhiqiang Yan, Xiang Chen, Shangbing Gao, Jun Li, Jian Yang
TL;DR
This work tackles simultaneous dehazing and depth estimation from real-world monocular hazy video by unifying the atmospheric scattering model (ASM) with the brightness consistency constraint (BCC) through a shared depth network. It introduces a depth-centric learning (DCL) framework that leverages adjacent dehazed frames, a non-aligned reference strategy, and two discriminators—MFIR for high-frequency detail in dehazed frames and MDR for mitigating depth holes in weak-texture areas. The method demonstrates state-of-the-art performance on real hazy video benchmarks (GoProHazy, DrivingHazy, InternetHazy) and achieves superior depth estimation on DENSE-Fog, while offering fast inference suitable for driving scenarios. The results highlight the benefits of integrating physics-based haze models with self-supervised cues and misaligned regularization to robustly handle real-world haze and texture variations.
Abstract
In this paper, we study the challenging problem of simultaneously removing haze and estimating depth from real monocular hazy videos. These tasks are inherently complementary: enhanced depth estimation improves dehazing via the atmospheric scattering model (ASM), while superior dehazing contributes to more accurate depth estimation through the brightness consistency constraint (BCC). To tackle these intertwined tasks, we propose a novel depth-centric learning framework that integrates the ASM model with the BCC constraint. Our key idea is that both ASM and BCC rely on a shared depth estimation network. This network simultaneously exploits adjacent dehazed frames to enhance depth estimation via BCC and uses the refined depth cues to more effectively remove haze through ASM. Additionally, we leverage a non-aligned clear video and its estimated depth to independently regularize the dehazing and depth estimation networks. This is achieved by designing two discriminator networks: $D_{MFIR}$ enhances high-frequency details in dehazed videos, and $D_{MDR}$ reduces the occurrence of black holes in low-texture regions. Extensive experiments demonstrate that the proposed method outperforms current state-of-the-art techniques in both video dehazing and depth estimation tasks, especially in real-world hazy scenes. Project page: https://fanjunkai1.github.io/projectpage/DCL/index.html.
