Table of Contents
Fetching ...

UENR-600K: A Large-Scale Physically Grounded Dataset for Nighttime Video Deraining

Pei Yang, Hai Ci, Beibei Lin, Yiren Song, Mike Zheng Shou

Abstract

Nighttime video deraining is uniquely challenging because raindrops interact with artificial lighting. Unlike daytime white rain, nighttime rain takes on various colors and appears locally illuminated. Existing small-scale synthetic datasets rely on 2D rain overlays and fail to capture these physical properties, causing models to generalize poorly to real-world night rain. Meanwhile, capturing real paired nighttime videos remains impractical because rain effects cannot be isolated from other degradations like sensor noise. To bridge this gap, we introduce UENR-600K, a large-scale, physically grounded dataset containing 600,000 1080p frame pairs. We utilize Unreal Engine to simulate rain as 3D particles within virtual environments. This approach guarantees photorealism and physically real raindrops, capturing correct details like color refractions, scene occlusions, rain curtains. Leveraging this high-quality data, we establish a new state-of-the-art baseline by adapting the Wan 2.2 video generation model. Our baseline treat deraining as a video-to-video generation task, exploiting strong generative priors to almost entirely bridge the sim-to-real gap. Extensive benchmarking demonstrates that models trained on our dataset generalize significantly better to real-world videos. Project page: https://showlab.github.io/UENR-600K/.

UENR-600K: A Large-Scale Physically Grounded Dataset for Nighttime Video Deraining

Abstract

Nighttime video deraining is uniquely challenging because raindrops interact with artificial lighting. Unlike daytime white rain, nighttime rain takes on various colors and appears locally illuminated. Existing small-scale synthetic datasets rely on 2D rain overlays and fail to capture these physical properties, causing models to generalize poorly to real-world night rain. Meanwhile, capturing real paired nighttime videos remains impractical because rain effects cannot be isolated from other degradations like sensor noise. To bridge this gap, we introduce UENR-600K, a large-scale, physically grounded dataset containing 600,000 1080p frame pairs. We utilize Unreal Engine to simulate rain as 3D particles within virtual environments. This approach guarantees photorealism and physically real raindrops, capturing correct details like color refractions, scene occlusions, rain curtains. Leveraging this high-quality data, we establish a new state-of-the-art baseline by adapting the Wan 2.2 video generation model. Our baseline treat deraining as a video-to-video generation task, exploiting strong generative priors to almost entirely bridge the sim-to-real gap. Extensive benchmarking demonstrates that models trained on our dataset generalize significantly better to real-world videos. Project page: https://showlab.github.io/UENR-600K/.

Paper Structure

This paper contains 53 sections, 3 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Paper Overview. Top: We use Unreal Engine 5 to simulate rain as 3D particles within virtual environments, producing 600,000 paired 1080p frames with physically grounded nighttime rain. Left: a rainy video frame; right: the paired ground truth. Bottom: We finetune a video Diffusion Transformer on our dataset for nighttime video deraining. Given a real nighttime rain video (left), our baseline removes rain and rain-induced fog near the streetlamp while preserving scene detail (right; see red crop).
  • Figure 2: Properties of nighttime rain illustrated with frames from our dataset. Chromaticity: raindrops refract colored artificial light (blue, yellow, green, red) rather than appearing white. Localization: rain is visible near light sources but fades in unlit regions. Glimmer effect: raindrops produce sudden high-intensity flashes as they pass through focused light beams. Rain curtains: wind-driven sheets of rain form volumetric, shifting patterns.
  • Figure 3: Comparison of rain synthesis between SynNightRain SynNightRain (top row) and our dataset (bottom row). Each pair shows a ground-truth frame alongside its rainy counterpart. SynNightRain overlays rain as a global white layer that uniformly covers the entire frame, without responding to scene geometry or lighting. Our dataset simulates rain within a virtual scene: raindrops are correctly occluded by scene objects, form volumetric rain curtains, and refract local artificial light to produce colorful, spatially varying streaks.
  • Figure 4: Our baseline architecture, adapted from the Wan 2.2 Video DiT. The rainy input is encoded into condition tokens (blue) and concatenated with generation tokens (red); the DiT denoises only the generation tokens while using the condition tokens as context. A unidirectional attention mask prevents condition tokens from attending to generation tokens, keeping the input uncorrupted. Only LoRA adapters on the QKV projections are trained; all other parameters stay frozen.
  • Figure 5: Qualitative comparison of all eight methods on four real nighttime rain scenes. All methods trained on our dataset. Red annotations highlight regions for comparison. Existing restoration methods (ESTINet through UConNet) reduce rain to varying degrees but leave visible streaks in heavy rain regions. The two 64$\times$64 diffusion models (WeatherDiff, NightRain) introduce haze or darken the scene. Our baseline, leveraging its pretrained generative prior, removes rain almost completely without introducing artifacts, demonstrating that the sim-to-real gap is effectively bridged.
  • ...and 4 more figures