DarkDiff: Advancing Low-Light Raw Enhancement by Retasking Diffusion Models for Camera ISP
Amber Yijia Zheng, Yu Zhang, Jun Hu, Raymond A. Yeh, Chen Chen
TL;DR
DarkDiff addresses extreme low-light raw image enhancement by retasking a pre-trained diffusion model to operate within the camera ISP. It introduces an ISP-aware data pipeline, region-based cross-attention conditioning, a content-preserving residual VAE, and a decoder-space reconstruction loss to mitigate color shifts, enabling high perceptual quality as measured by $LPIPS$ across SID, ELD, and LRD while maintaining competitive $PSNR$ and $SSIM$. Quantitative and qualitative results show DarkDiff outperforming regression-based and diffusion-from-scratch baselines in perceptual fidelity, with ablations validating the necessity of each component. The approach leverages pre-trained diffusion capabilities to reduce data requirements and achieve practical improvements for low-light photography, though it trade-offs inference speed and depends on the base diffusion model’s strengths.
Abstract
High-quality photography in extreme low-light conditions is challenging but impactful for digital cameras. With advanced computing hardware, traditional camera image signal processor (ISP) algorithms are gradually being replaced by efficient deep networks that enhance noisy raw images more intelligently. However, existing regression-based models often minimize pixel errors and result in oversmoothing of low-light photos or deep shadows. Recent work has attempted to address this limitation by training a diffusion model from scratch, yet those models still struggle to recover sharp image details and accurate colors. We introduce a novel framework to enhance low-light raw images by retasking pre-trained generative diffusion models with the camera ISP. Extensive experiments demonstrate that our method outperforms the state-of-the-art in perceptual quality across three challenging low-light raw image benchmarks.
