Table of Contents
Fetching ...

RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation

Christoph Reinders, Radu Berdan, Beril Besbinar, Junji Otsuka, Daisuke Iso

TL;DR

This work proposes a novel diffusion-based method for generating RAW images guided by RGB images, which integrates an RGB-guidance module for feature extraction from RGB inputs, then incorporates these features into the reverse diffusion process with RGB-guided residual blocks across various resolutions, yielding high-fidelity RAW images.

Abstract

Current deep learning approaches in computer vision primarily focus on RGB data sacrificing information. In contrast, RAW images offer richer representation, which is crucial for precise recognition, particularly in challenging conditions like low-light environments. The resultant demand for comprehensive RAW image datasets contrasts with the labor-intensive process of creating specific datasets for individual sensors. To address this, we propose a novel diffusion-based method for generating RAW images guided by RGB images. Our approach integrates an RGB-guidance module for feature extraction from RGB inputs, then incorporates these features into the reverse diffusion process with RGB-guided residual blocks across various resolutions. This approach yields high-fidelity RAW images, enabling the creation of camera-specific RAW datasets. Our RGB2RAW experiments on four DSLR datasets demonstrate state-of-the-art performance. Moreover, RAW-Diffusion demonstrates exceptional data efficiency, achieving remarkable performance with as few as 25 training samples or even fewer. We extend our method to create BDD100K-RAW and Cityscapes-RAW datasets, revealing its effectiveness for object detection in RAW imagery, significantly reducing the amount of required RAW images.

RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation

TL;DR

This work proposes a novel diffusion-based method for generating RAW images guided by RGB images, which integrates an RGB-guidance module for feature extraction from RGB inputs, then incorporates these features into the reverse diffusion process with RGB-guided residual blocks across various resolutions, yielding high-fidelity RAW images.

Abstract

Current deep learning approaches in computer vision primarily focus on RGB data sacrificing information. In contrast, RAW images offer richer representation, which is crucial for precise recognition, particularly in challenging conditions like low-light environments. The resultant demand for comprehensive RAW image datasets contrasts with the labor-intensive process of creating specific datasets for individual sensors. To address this, we propose a novel diffusion-based method for generating RAW images guided by RGB images. Our approach integrates an RGB-guidance module for feature extraction from RGB inputs, then incorporates these features into the reverse diffusion process with RGB-guided residual blocks across various resolutions. This approach yields high-fidelity RAW images, enabling the creation of camera-specific RAW datasets. Our RGB2RAW experiments on four DSLR datasets demonstrate state-of-the-art performance. Moreover, RAW-Diffusion demonstrates exceptional data efficiency, achieving remarkable performance with as few as 25 training samples or even fewer. We extend our method to create BDD100K-RAW and Cityscapes-RAW datasets, revealing its effectiveness for object detection in RAW imagery, significantly reducing the amount of required RAW images.

Paper Structure

This paper contains 37 sections, 3 equations, 7 figures, 16 tables.

Figures (7)

  • Figure 1: RAW-Diffusion enables the generation of high-fidelity RAW images by iterative denoising of a noisy RAW input through a RGB-guidance module and RGB-guided residual blocks. RAW-Diffusion is the first successful diffusion-based method for RAW generation that outperforms state-of-the-art methods.
  • Figure 2: The RAW-Diffusion architecture consists of an RGB-guidance module for creating guidance features and an encoder for processing noisy RAW inputs. The guidance features are then integrated into both the bottleneck and decoder with RGB-guidance residual blocks, modulating the diffusion features to reconstruct the RAW image.
  • Figure 3: Qualitative results on FiveK (top) and NOD (bottom). The reconstructed RAW image and the error map are presented for each method. The RAW images are shown with a gamma correction of $1/2.2$ for visualization.
  • Figure 4: Qualitative results on NOD Nikon (top) and Sony (bottom) for training on the RGB and RAW dataset and a combination with BDD100K-RAW generated by RAW-Diffusion. Please refer to the supplementary material for further qualitative analysis.
  • Figure 5: Analysis of different probabilities $p_{\text{gen}}$. The performance improves with increasing probability of sampling from Cityscapes-RAW and BDD100K-RAW, respectively, reaching a peak before training exclusively on the generated datasets.
  • ...and 2 more figures