Leveraging Scene Geometry and Depth Information for Robust Image Deraining
Ningning Xu, Jidong J. Yang
TL;DR
This work tackles image deraining for autonomous driving by introducing a depth-informed, multi-network framework. A Deraining AutoEncoder (DerainAE) is augmented with a DepthNet to inject global scene geometry, while two supervisory signals enforce feature and depth consistency between rainy and clear scenes; a pretrained VAE provides latent cues for the clear image and a VGG16-based perceptual loss guides training. The model is trained with a composite loss that combines perceptual, depth-consistency, deraining-consistency, and reconstruction terms ($L_{perceptual}$, $L_{depth\_consist}$, $L_{derain\_consist}$, $L_{derain}$, $L_{depth}$). Evaluations on RainCityScapes, RainKITTI2012, and RainKITTI2015 show improved PSNR/SSIM and faster inference than baselines, with ablation studies confirming the contribution of depth latent and depth-feature concatenation; vehicle-detection experiments demonstrate meaningful gains in recall when deraining. The approach offers practical impact for robust perception in rain, enabling more reliable autonomous driving under adverse weather conditions.
Abstract
Image deraining holds great potential for enhancing the vision of autonomous vehicles in rainy conditions, contributing to safer driving. Previous works have primarily focused on employing a single network architecture to generate derained images. However, they often fail to fully exploit the rich prior knowledge embedded in the scenes. Particularly, most methods overlook the depth information that can provide valuable context about scene geometry and guide more robust deraining. In this work, we introduce a novel learning framework that integrates multiple networks: an AutoEncoder for deraining, an auxiliary network to incorporate depth information, and two supervision networks to enforce feature consistency between rainy and clear scenes. This multi-network design enables our model to effectively capture the underlying scene structure, producing clearer and more accurately derained images, leading to improved object detection for autonomous vehicles. Extensive experiments on three widely-used datasets demonstrated the effectiveness of our proposed method.
