Table of Contents
Fetching ...

Leveraging Scene Geometry and Depth Information for Robust Image Deraining

Ningning Xu, Jidong J. Yang

TL;DR

This work tackles image deraining for autonomous driving by introducing a depth-informed, multi-network framework. A Deraining AutoEncoder (DerainAE) is augmented with a DepthNet to inject global scene geometry, while two supervisory signals enforce feature and depth consistency between rainy and clear scenes; a pretrained VAE provides latent cues for the clear image and a VGG16-based perceptual loss guides training. The model is trained with a composite loss that combines perceptual, depth-consistency, deraining-consistency, and reconstruction terms ($L_{perceptual}$, $L_{depth\_consist}$, $L_{derain\_consist}$, $L_{derain}$, $L_{depth}$). Evaluations on RainCityScapes, RainKITTI2012, and RainKITTI2015 show improved PSNR/SSIM and faster inference than baselines, with ablation studies confirming the contribution of depth latent and depth-feature concatenation; vehicle-detection experiments demonstrate meaningful gains in recall when deraining. The approach offers practical impact for robust perception in rain, enabling more reliable autonomous driving under adverse weather conditions.

Abstract

Image deraining holds great potential for enhancing the vision of autonomous vehicles in rainy conditions, contributing to safer driving. Previous works have primarily focused on employing a single network architecture to generate derained images. However, they often fail to fully exploit the rich prior knowledge embedded in the scenes. Particularly, most methods overlook the depth information that can provide valuable context about scene geometry and guide more robust deraining. In this work, we introduce a novel learning framework that integrates multiple networks: an AutoEncoder for deraining, an auxiliary network to incorporate depth information, and two supervision networks to enforce feature consistency between rainy and clear scenes. This multi-network design enables our model to effectively capture the underlying scene structure, producing clearer and more accurately derained images, leading to improved object detection for autonomous vehicles. Extensive experiments on three widely-used datasets demonstrated the effectiveness of our proposed method.

Leveraging Scene Geometry and Depth Information for Robust Image Deraining

TL;DR

This work tackles image deraining for autonomous driving by introducing a depth-informed, multi-network framework. A Deraining AutoEncoder (DerainAE) is augmented with a DepthNet to inject global scene geometry, while two supervisory signals enforce feature and depth consistency between rainy and clear scenes; a pretrained VAE provides latent cues for the clear image and a VGG16-based perceptual loss guides training. The model is trained with a composite loss that combines perceptual, depth-consistency, deraining-consistency, and reconstruction terms (, , , , ). Evaluations on RainCityScapes, RainKITTI2012, and RainKITTI2015 show improved PSNR/SSIM and faster inference than baselines, with ablation studies confirming the contribution of depth latent and depth-feature concatenation; vehicle-detection experiments demonstrate meaningful gains in recall when deraining. The approach offers practical impact for robust perception in rain, enabling more reliable autonomous driving under adverse weather conditions.

Abstract

Image deraining holds great potential for enhancing the vision of autonomous vehicles in rainy conditions, contributing to safer driving. Previous works have primarily focused on employing a single network architecture to generate derained images. However, they often fail to fully exploit the rich prior knowledge embedded in the scenes. Particularly, most methods overlook the depth information that can provide valuable context about scene geometry and guide more robust deraining. In this work, we introduce a novel learning framework that integrates multiple networks: an AutoEncoder for deraining, an auxiliary network to incorporate depth information, and two supervision networks to enforce feature consistency between rainy and clear scenes. This multi-network design enables our model to effectively capture the underlying scene structure, producing clearer and more accurately derained images, leading to improved object detection for autonomous vehicles. Extensive experiments on three widely-used datasets demonstrated the effectiveness of our proposed method.
Paper Structure (14 sections, 5 equations, 5 figures, 10 tables)

This paper contains 14 sections, 5 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: The overall architecture of our model. A pretrained VAE extracts clear features, while the DerainAE and DepthNet modules handle rainy images. Latent space comparison between clear and rainy features improves depth estimation and deraining images prediction.
  • Figure 2: An overview of our DepthNet and DerainAE architecture. Left: DepthNet, this model employs a U-Net structure, with skip connections from each encoder layer to the corresponding decoder layers. The network outputs two disparity maps, with Disp0 used as the final predicted depth map. Right: DerainAE, this model is a simple convolutional network with skip connections at corresponding feature levels between encoder and decoder.
  • Figure 3: Visualization results of RainCityScapes and RainKITTI2012 dataset. The First two columns are exemplar images from the RainCityScapes dataset and corresponding derained outputs; The last two columns are exemplar images from the RainKITTI2012 dataset and corresponding derained outputs.
  • Figure 4: Vehicle detection results using YOLOv11 on the RainKITTI2015 dataset.
  • Figure 5: Vehicle detection results using YOLOv11 on the RainCityScapes dataset.