Table of Contents
Fetching ...

Simple Image Signal Processing using Global Context Guidance

Omar Elezabi, Marcos V. Conde, Radu Timofte

TL;DR

This work tackles the global-context deficiency of patch-trained neural ISPs by introducing a Color Module (CMod) that encodes full-image guidance into a $k$-dimensional modification vector $mv = E(G)$ and applies it to RAW patches via $X_{mv} = X_m \odot mv$, producing color-corrected patches subsequently mapped to RGB. The proposed SimpleISP architecture separates global color adjustments (via CMod) from local RGB reconstruction, enabling end-to-end training with full-resolution RAW guidance while remaining computationally efficient. Empirically, incorporating CMod yields substantial improvements (e.g., about $+2$ dB PSNR on several benchmarks) and DSLR-like color reproduction, while dramatically reducing model complexity (roughly 20× fewer parameters than strong baselines). The approach is validated on newly created ZRR Small and ISPIW datasets and on RAW super-resolution tasks, demonstrating strong generalization to diverse smartphone data and practical potential for mobile ISP pipelines.

Abstract

In modern smartphone cameras, the Image Signal Processor (ISP) is the core element that converts the RAW readings from the sensor into perceptually pleasant RGB images for the end users. The ISP is typically proprietary and handcrafted and consists of several blocks such as white balance, color correction, and tone mapping. Deep learning-based ISPs aim to transform RAW images into DSLR-like RGB images using deep neural networks. However, most learned ISPs are trained using patches (small regions) due to computational limitations. Such methods lack global context, which limits their efficacy on full-resolution images and harms their ability to capture global properties such as color constancy or illumination. First, we propose a novel module that can be integrated into any neural ISP to capture the global context information from the full RAW images. Second, we propose an efficient and simple neural ISP that utilizes our proposed module. Our model achieves state-of-the-art results on different benchmarks using diverse and real smartphone images.

Simple Image Signal Processing using Global Context Guidance

TL;DR

This work tackles the global-context deficiency of patch-trained neural ISPs by introducing a Color Module (CMod) that encodes full-image guidance into a -dimensional modification vector and applies it to RAW patches via , producing color-corrected patches subsequently mapped to RGB. The proposed SimpleISP architecture separates global color adjustments (via CMod) from local RGB reconstruction, enabling end-to-end training with full-resolution RAW guidance while remaining computationally efficient. Empirically, incorporating CMod yields substantial improvements (e.g., about dB PSNR on several benchmarks) and DSLR-like color reproduction, while dramatically reducing model complexity (roughly 20× fewer parameters than strong baselines). The approach is validated on newly created ZRR Small and ISPIW datasets and on RAW super-resolution tasks, demonstrating strong generalization to diverse smartphone data and practical potential for mobile ISP pipelines.

Abstract

In modern smartphone cameras, the Image Signal Processor (ISP) is the core element that converts the RAW readings from the sensor into perceptually pleasant RGB images for the end users. The ISP is typically proprietary and handcrafted and consists of several blocks such as white balance, color correction, and tone mapping. Deep learning-based ISPs aim to transform RAW images into DSLR-like RGB images using deep neural networks. However, most learned ISPs are trained using patches (small regions) due to computational limitations. Such methods lack global context, which limits their efficacy on full-resolution images and harms their ability to capture global properties such as color constancy or illumination. First, we propose a novel module that can be integrated into any neural ISP to capture the global context information from the full RAW images. Second, we propose an efficient and simple neural ISP that utilizes our proposed module. Our model achieves state-of-the-art results on different benchmarks using diverse and real smartphone images.
Paper Structure (14 sections, 5 equations, 10 figures, 4 tables)

This paper contains 14 sections, 5 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: (Up) Huawei P20 ISP and (Bot.) our SimpleISP neural network. Both process the same Huawei P20 RAW image.
  • Figure 2: Example of the limitations of learned ISPs trained using patches. We can appreciate inconsistent vignetting, non-uniform illumination and colors (see the center and sky).
  • Figure 3: This figure shows the architecture of our proposed CMod (Top). $E$ is the encoder network that takes the resized full RAW image as input and produces the modification vector. Proj is the projection network that projects the input to the modification space, and then back to the RGB space. The global guidance is applied by a channel-wise multiplication $\odot$ between the projected image and the modification vector.
  • Figure 4: This figure shows the full pipeline of our proposed ISP model, SimpleISP . First, we feed the RAW image into the CMod module for color reproduction. Then the output of CMod is processed by the full reconstruction network to produce the final RGB output. This network is conformed by three blocks B$_i$ inspired by Chen et al.chen2022simple. We illustrate these building blocks --the Baseline block-- on the right side. We build SimpleISP using three baseline blocks.
  • Figure 5: Visual ablation study on the effect of our global context module. We use as baseline LiteISP zhang2021learning. We show the results using the Patch (P.) guidance, and the Full-resolution Image (FI.) guidance. Our global guidance improves notably the results.
  • ...and 5 more figures