Table of Contents
Fetching ...

Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving

Jinlong Li, Baolu Li, Zhengzhong Tu, Xinyu Liu, Qing Guo, Felix Juefei-Xu, Runsheng Xu, Hongkai Yu

TL;DR

This work tackles the challenge of vision-based autonomous driving under low-light by introducing LightDiff, a diffusion-based enhancement framework that operates without manually collected paired night–day data. It leans on a dynamic low-light degradation pipeline to synthesize training pairs, a multi-condition adapter to fuse degraded RGB, depth, and text cues, and a perception-guided reinforcement learning objective to align image enhancement with downstream detectors. A recurrent inference strategy further refines prompts and depth maps to boost results. On nuScenes, LightDiff significantly improves nighttime 3D vehicle detection for state-of-the-art detectors and achieves strong visual quality scores, highlighting its potential to enhance safety in real-world night driving.

Abstract

Vision-centric perception systems for autonomous driving have gained considerable attention recently due to their cost-effectiveness and scalability, especially compared to LiDAR-based systems. However, these systems often struggle in low-light conditions, potentially compromising their performance and safety. To address this, our paper introduces LightDiff, a domain-tailored framework designed to enhance the low-light image quality for autonomous driving applications. Specifically, we employ a multi-condition controlled diffusion model. LightDiff works without any human-collected paired data, leveraging a dynamic data degradation process instead. It incorporates a novel multi-condition adapter that adaptively controls the input weights from different modalities, including depth maps, RGB images, and text captions, to effectively illuminate dark scenes while maintaining context consistency. Furthermore, to align the enhanced images with the detection model's knowledge, LightDiff employs perception-specific scores as rewards to guide the diffusion training process through reinforcement learning. Extensive experiments on the nuScenes datasets demonstrate that LightDiff can significantly improve the performance of several state-of-the-art 3D detectors in night-time conditions while achieving high visual quality scores, highlighting its potential to safeguard autonomous driving.

Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving

TL;DR

This work tackles the challenge of vision-based autonomous driving under low-light by introducing LightDiff, a diffusion-based enhancement framework that operates without manually collected paired night–day data. It leans on a dynamic low-light degradation pipeline to synthesize training pairs, a multi-condition adapter to fuse degraded RGB, depth, and text cues, and a perception-guided reinforcement learning objective to align image enhancement with downstream detectors. A recurrent inference strategy further refines prompts and depth maps to boost results. On nuScenes, LightDiff significantly improves nighttime 3D vehicle detection for state-of-the-art detectors and achieves strong visual quality scores, highlighting its potential to enhance safety in real-world night driving.

Abstract

Vision-centric perception systems for autonomous driving have gained considerable attention recently due to their cost-effectiveness and scalability, especially compared to LiDAR-based systems. However, these systems often struggle in low-light conditions, potentially compromising their performance and safety. To address this, our paper introduces LightDiff, a domain-tailored framework designed to enhance the low-light image quality for autonomous driving applications. Specifically, we employ a multi-condition controlled diffusion model. LightDiff works without any human-collected paired data, leveraging a dynamic data degradation process instead. It incorporates a novel multi-condition adapter that adaptively controls the input weights from different modalities, including depth maps, RGB images, and text captions, to effectively illuminate dark scenes while maintaining context consistency. Furthermore, to align the enhanced images with the detection model's knowledge, LightDiff employs perception-specific scores as rewards to guide the diffusion training process through reinforcement learning. Extensive experiments on the nuScenes datasets demonstrate that LightDiff can significantly improve the performance of several state-of-the-art 3D detectors in night-time conditions while achieving high visual quality scores, highlighting its potential to safeguard autonomous driving.
Paper Structure (15 sections, 8 equations, 9 figures, 4 tables)

This paper contains 15 sections, 8 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Nighttime driving scenarios pose a greater fatal threat than daytime. The fatal rate at night is much higher ashraf2019catastrophic. This paper aims to enhance nighttime images to improve the overall driving safety at night.
  • Figure 2: The architecture of our Lighting Diffusion model (LightDiff). During the training stage, a Training Data Generation pipeline enables the acquisition of triple-modality data without any human-collected paired data. Our LightDiff employs a Multi-Condition Adapter to dynamically weight multiple conditions, coupled with LiDAR and Distribution Reward Modeling (LDRM), allowing for perception-oriented control.
  • Figure 3: The pipeline of our Training Data Generation. The low-light-degradation transform cui2021multitask is exclusively implemented during the training stage. The trained depth estimation network will be frozen to be used for the training and testing stages of our lighting diffusion model.
  • Figure 4: Illustration of the Recurrent Lighting Inference. It is designed to enhance the precision of generating text prompts and depth maps, thereby mitigating adverse effects on dark images.
  • Figure 5: Visual comparison on the example nighttime images in the nuScenes validation set.
  • ...and 4 more figures