Table of Contents
Fetching ...

SpiralDiff: Spiral Diffusion with LoRA for RGB-to-RAW Conversion Across Cameras

Huanjing Yue, Shangbin Xie, Cong Cao, Qian Wu, Lei Zhang, Lei Zhao, Jingyu Yang

Abstract

RAW images preserve superior fidelity and rich scene information compared to RGB, making them essential for tasks in challenging imaging conditions. To alleviate the high cost of data collection, recent RGB-to-RAW conversion methods aim to synthesize RAW images from RGB. However, they overlook two key challenges: (i) the reconstruction difficulty varies with pixel intensity, and (ii) multi-camera conversion requires camera-specific adaptation. To address these issues, we propose SpiralDiff, a diffusion-based framework tailored for RGB-to-RAW conversion with a signal-dependent noise weighting strategy that adapts reconstruction fidelity across intensity levels. In addition, we introduce CamLoRA, a camera-aware lightweight adaptation module that enables a unified model to adapt to different camera-specific ISP characteristics. Extensive experiments on four benchmark datasets demonstrate the superiority of SpiralDiff in RGB-to-RAW conversion quality and its downstream benefits in RAW-based object detection. Our code and model are available at https://github.com/Chuancy-TJU/SpiralDiff.

SpiralDiff: Spiral Diffusion with LoRA for RGB-to-RAW Conversion Across Cameras

Abstract

RAW images preserve superior fidelity and rich scene information compared to RGB, making them essential for tasks in challenging imaging conditions. To alleviate the high cost of data collection, recent RGB-to-RAW conversion methods aim to synthesize RAW images from RGB. However, they overlook two key challenges: (i) the reconstruction difficulty varies with pixel intensity, and (ii) multi-camera conversion requires camera-specific adaptation. To address these issues, we propose SpiralDiff, a diffusion-based framework tailored for RGB-to-RAW conversion with a signal-dependent noise weighting strategy that adapts reconstruction fidelity across intensity levels. In addition, we introduce CamLoRA, a camera-aware lightweight adaptation module that enables a unified model to adapt to different camera-specific ISP characteristics. Extensive experiments on four benchmark datasets demonstrate the superiority of SpiralDiff in RGB-to-RAW conversion quality and its downstream benefits in RAW-based object detection. Our code and model are available at https://github.com/Chuancy-TJU/SpiralDiff.
Paper Structure (28 sections, 23 equations, 6 figures, 12 tables, 2 algorithms)

This paper contains 28 sections, 23 equations, 6 figures, 12 tables, 2 algorithms.

Figures (6)

  • Figure 1: Comparison of noise schedule (top) and visualization of the noisy $\bm{x}_T$ (bottom) in diffusion: ResShift yue2023resshift uses uniform Gaussian noise, whereas our SpiralDiff introduces a signal-dependent noise schedule based on pixel intensity ($\bm{w}_t$).
  • Figure 2: Relationship between RGB pixel intensity and residual magnitude between RGB and RAW images across channels on the FiveK Nikon dataset. Colored lines and shaded regions represent the mean and standard deviation, respectively.
  • Figure 3: Overview of the proposed SpiralDiff with CamLoRA. (a) SpiralDiff introduces a signal-dependent noise schedule via a weight map set $\{\bm{w}_t\}_{t=1}^T$ that aligns with the RAW-to-RGB conversion process. The noise level depends on local pixel intensity and diffusion timestep $t$: darker regions ($\triangle$) receive less noise, and brighter regions ($\bigcirc$) receive more. (b) The spiral structure visualizes how noise scales with $\bm{w}_t$ for different pixel intensities. (c) shows the framework of SpiralDiff. The noisy image $\bm{x}_t$, RGB image $\bm{y}_0$ and camera label are fed into denoising U-Net, which iteratively samples to refine the RAW output. (d) The camera label selects the camera-specific LoRA layer in CamLoRA, enhancing adaptation to each camera's characteristic.
  • Figure 4: Qualitative comparison with state-of-the-art RGB-to-RAW conversion methods on the FiveK dataset (top two rows) and the NOD dataset (bottom two rows). For each result, the left half is the predicted RAW, and the right half is the error map. The proposed SpiralDiff shows better conversion results, especially in bright regions.
  • Figure 5: Residual-intensity relationship on rawpy ISP (FiveK Nikon) and real ISP (iPhone and Samsung).
  • ...and 1 more figures