Table of Contents
Fetching ...

NightRain: Nighttime Video Deraining via Adaptive-Rain-Removal and Adaptive-Correction

Beibei Lin, Yeying Jin, Wending Yan, Wei Ye, Yuan Yuan, Shunli Zhang, Robby Tan

TL;DR

NightRain tackles the challenge of real-world nighttime rain removal by bridging the synthetic-real domain gap through adaptive-rain-removal and adaptive-correction. It uses a teacher-student diffusion framework with region-based supervision and confidence maps to learn from unlabeled rain videos and real clear night footage, updating the teacher via Exponential Moving Average. The method demonstrates state-of-the-art restoration with a PSNR of 26.73 dB on SynNightRain and reduces artifacts such as over-saturation and color shifts. This framework reduces reliance on paired data and shows strong generalization to real-world nighttime rain scenarios, with practical implications for video enhancement in low-light conditions.

Abstract

Existing deep-learning-based methods for nighttime video deraining rely on synthetic data due to the absence of real-world paired data. However, the intricacies of the real world, particularly with the presence of light effects and low-light regions affected by noise, create significant domain gaps, hampering synthetic-trained models in removing rain streaks properly and leading to over-saturation and color shifts. Motivated by this, we introduce NightRain, a novel nighttime video deraining method with adaptive-rain-removal and adaptive-correction. Our adaptive-rain-removal uses unlabeled rain videos to enable our model to derain real-world rain videos, particularly in regions affected by complex light effects. The idea is to allow our model to obtain rain-free regions based on the confidence scores. Once rain-free regions and the corresponding regions from our input are obtained, we can have region-based paired real data. These paired data are used to train our model using a teacher-student framework, allowing the model to iteratively learn from less challenging regions to more challenging regions. Our adaptive-correction aims to rectify errors in our model's predictions, such as over-saturation and color shifts. The idea is to learn from clear night input training videos based on the differences or distance between those input videos and their corresponding predictions. Our model learns from these differences, compelling our model to correct the errors. From extensive experiments, our method demonstrates state-of-the-art performance. It achieves a PSNR of 26.73dB, surpassing existing nighttime video deraining methods by a substantial margin of 13.7%.

NightRain: Nighttime Video Deraining via Adaptive-Rain-Removal and Adaptive-Correction

TL;DR

NightRain tackles the challenge of real-world nighttime rain removal by bridging the synthetic-real domain gap through adaptive-rain-removal and adaptive-correction. It uses a teacher-student diffusion framework with region-based supervision and confidence maps to learn from unlabeled rain videos and real clear night footage, updating the teacher via Exponential Moving Average. The method demonstrates state-of-the-art restoration with a PSNR of 26.73 dB on SynNightRain and reduces artifacts such as over-saturation and color shifts. This framework reduces reliance on paired data and shows strong generalization to real-world nighttime rain scenarios, with practical implications for video enhancement in low-light conditions.

Abstract

Existing deep-learning-based methods for nighttime video deraining rely on synthetic data due to the absence of real-world paired data. However, the intricacies of the real world, particularly with the presence of light effects and low-light regions affected by noise, create significant domain gaps, hampering synthetic-trained models in removing rain streaks properly and leading to over-saturation and color shifts. Motivated by this, we introduce NightRain, a novel nighttime video deraining method with adaptive-rain-removal and adaptive-correction. Our adaptive-rain-removal uses unlabeled rain videos to enable our model to derain real-world rain videos, particularly in regions affected by complex light effects. The idea is to allow our model to obtain rain-free regions based on the confidence scores. Once rain-free regions and the corresponding regions from our input are obtained, we can have region-based paired real data. These paired data are used to train our model using a teacher-student framework, allowing the model to iteratively learn from less challenging regions to more challenging regions. Our adaptive-correction aims to rectify errors in our model's predictions, such as over-saturation and color shifts. The idea is to learn from clear night input training videos based on the differences or distance between those input videos and their corresponding predictions. Our model learns from these differences, compelling our model to correct the errors. From extensive experiments, our method demonstrates state-of-the-art performance. It achieves a PSNR of 26.73dB, surpassing existing nighttime video deraining methods by a substantial margin of 13.7%.
Paper Structure (11 sections, 7 equations, 9 figures, 1 table)

This paper contains 11 sections, 7 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Qualitative results on real-world nighttime rain videos. First column: Input image. Second column: MetaRain's results patil2022video. Third column: Our results. Zoom-in for better visualization.
  • Figure 2: Overview of our adaptive-rain-removal. We pre-train a video diffusion model on synthetic nighttime deraining datasets as a teacher model. Our adaptive-rain-removal utilizes the teacher model to generate predictions from real-world nighttime rain videos. We also generate confidence maps of rain streak removal within these predictions (green lines). The high-confidence predictions with their corresponding inputs are then selected to train a student model, thus reducing the domain gap (blue lines). Finally, we utilize Exponential Moving Average (EMA) to update our teacher model.
  • Figure 3: Overview of our adaptive-correction. We utilize our teacher model to generate predictions from nighttime clear videos (grey lines). The difference regions between clear videos and their corresponding predictions essentially represent errors produced by our model itself. We then use these difference pairs to train a student model, thus correcting our model's errors (blue lines). Finally, we utilize Exponential Moving Average (EMA) to update our teacher model.
  • Figure 4: Overview of our transformer-based noise estimation Network. Given a sequence and a time step as inputs, we convert the sequence into patches and encode the time step to the time embedding. Subsequently, several transformer blocks are applied to extract global spatiotemporal cues. Finally, a linear layer is proposed to convert feature representations to noise maps.
  • Figure 5: Visualization of predictions and confidence maps from our pretrained model. The red and blue regions are low-confidence and high-confidence predictions, respectively.
  • ...and 4 more figures