Table of Contents
Fetching ...

ISPDiffuser: Learning RAW-to-sRGB Mappings with Texture-Aware Diffusion Models and Histogram-Guided Color Consistency

Yang Ren, Hai Jiang, Menglong Yang, Wei Li, Shuaicheng Liu

TL;DR

ISPDiffuser tackles the RAW-to-sRGB mapping challenge by decoupling detail reconstruction from color mapping. It introduces a texture-aware diffusion model to refine grayscale details and a histogram-guided color consistency module to enforce accurate, DSLR-like colors, optimized via two-stage training with dedicated losses $L_{con}$, $L_{tel}$, and $L_{ccl}$. Across ZRR PyNet and MAI MAI benchmarks, it achieves state-of-the-art perceptual and quantitative metrics, and user studies corroborate its superior visual quality. The approach offers a practical pathway to DSLR-quality sRGB outputs on mobile RAW data, with potential for improved ISP pipelines when paired with efficient inference strategies.

Abstract

RAW-to-sRGB mapping, or the simulation of the traditional camera image signal processor (ISP), aims to generate DSLR-quality sRGB images from raw data captured by smartphone sensors. Despite achieving comparable results to sophisticated handcrafted camera ISP solutions, existing learning-based methods still struggle with detail disparity and color distortion. In this paper, we present ISPDiffuser, a diffusion-based decoupled framework that separates the RAW-to-sRGB mapping into detail reconstruction in grayscale space and color consistency mapping from grayscale to sRGB. Specifically, we propose a texture-aware diffusion model that leverages the generative ability of diffusion models to focus on local detail recovery, in which a texture enrichment loss is further proposed to prompt the diffusion model to generate more intricate texture details. Subsequently, we introduce a histogram-guided color consistency module that utilizes color histogram as guidance to learn precise color information for grayscale to sRGB color consistency mapping, with a color consistency loss designed to constrain the learned color information. Extensive experimental results show that the proposed ISPDiffuser outperforms state-of-the-art competitors both quantitatively and visually. The code is available at https://github.com/RenYangSCU/ISPDiffuser.

ISPDiffuser: Learning RAW-to-sRGB Mappings with Texture-Aware Diffusion Models and Histogram-Guided Color Consistency

TL;DR

ISPDiffuser tackles the RAW-to-sRGB mapping challenge by decoupling detail reconstruction from color mapping. It introduces a texture-aware diffusion model to refine grayscale details and a histogram-guided color consistency module to enforce accurate, DSLR-like colors, optimized via two-stage training with dedicated losses , , and . Across ZRR PyNet and MAI MAI benchmarks, it achieves state-of-the-art perceptual and quantitative metrics, and user studies corroborate its superior visual quality. The approach offers a practical pathway to DSLR-quality sRGB outputs on mobile RAW data, with potential for improved ISP pipelines when paired with efficient inference strategies.

Abstract

RAW-to-sRGB mapping, or the simulation of the traditional camera image signal processor (ISP), aims to generate DSLR-quality sRGB images from raw data captured by smartphone sensors. Despite achieving comparable results to sophisticated handcrafted camera ISP solutions, existing learning-based methods still struggle with detail disparity and color distortion. In this paper, we present ISPDiffuser, a diffusion-based decoupled framework that separates the RAW-to-sRGB mapping into detail reconstruction in grayscale space and color consistency mapping from grayscale to sRGB. Specifically, we propose a texture-aware diffusion model that leverages the generative ability of diffusion models to focus on local detail recovery, in which a texture enrichment loss is further proposed to prompt the diffusion model to generate more intricate texture details. Subsequently, we introduce a histogram-guided color consistency module that utilizes color histogram as guidance to learn precise color information for grayscale to sRGB color consistency mapping, with a color consistency loss designed to constrain the learned color information. Extensive experimental results show that the proposed ISPDiffuser outperforms state-of-the-art competitors both quantitatively and visually. The code is available at https://github.com/RenYangSCU/ISPDiffuser.

Paper Structure

This paper contains 18 sections, 9 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Visual comparison with the previous state-of-the-art method FourierISP FourierISP. Our approach exhibits better local detail reconstruction (the red boxes show the content difference between generated images and GT images) and global color consistency mapping capabilities.
  • Figure 2: The overall pipeline of our proposed framework. We first employ an encoder $\mathcal{E}(\cdot)$ to convert RAW image $I_{r}$ and grayscale version $I_{g}$ of the sRGB image into latent space denoted as $\mathcal{F}_{r}$ and $\mathcal{F}_{g}$. The encoded feature $\mathcal{F}_{g}$ is taken as the input of the proposed texture-aware diffusion model (TADM) to perform the forward diffusion process. With the guidance of the raw feature $\mathcal{F}_{r}$, we generate the reconstructed gray feature $\hat{\mathcal{F}}_{g}$ from the noised tensor $\mathbf{x}_{t}$ during training, which is replaced by randomly sampled Gaussian noise $\hat{\mathbf{x}}_{T}$ during inference. Finally, we utilize the proposed histogram-guided color consistency module (HCCM) to colorize the generated $\hat{\mathcal{F}}_{g}$ and subsequently send it to a decoder $\mathcal{D}(\cdot)$ to produce the final sRGB result $\hat{I}_{s}$.
  • Figure 3: The detailed architecture of our proposed histogram-guided color consistency module.
  • Figure 4: Qualitative comparison of our method and competitive methods on the ZRR dataset PyNet (row 1) and MAI dataset MAI (row 2). The error maps represent the content difference between the generated sRGB images and the GT images, the darker the better. Best viewed by zooming in.
  • Figure 5: Score distributions of user study, where the ordinate axis records the rating frequency received from the 26 participants. Our method receives more "best” ratings.
  • ...and 2 more figures