Simple Image Signal Processing using Global Context Guidance
Omar Elezabi, Marcos V. Conde, Radu Timofte
TL;DR
This work tackles the global-context deficiency of patch-trained neural ISPs by introducing a Color Module (CMod) that encodes full-image guidance into a $k$-dimensional modification vector $mv = E(G)$ and applies it to RAW patches via $X_{mv} = X_m \odot mv$, producing color-corrected patches subsequently mapped to RGB. The proposed SimpleISP architecture separates global color adjustments (via CMod) from local RGB reconstruction, enabling end-to-end training with full-resolution RAW guidance while remaining computationally efficient. Empirically, incorporating CMod yields substantial improvements (e.g., about $+2$ dB PSNR on several benchmarks) and DSLR-like color reproduction, while dramatically reducing model complexity (roughly 20× fewer parameters than strong baselines). The approach is validated on newly created ZRR Small and ISPIW datasets and on RAW super-resolution tasks, demonstrating strong generalization to diverse smartphone data and practical potential for mobile ISP pipelines.
Abstract
In modern smartphone cameras, the Image Signal Processor (ISP) is the core element that converts the RAW readings from the sensor into perceptually pleasant RGB images for the end users. The ISP is typically proprietary and handcrafted and consists of several blocks such as white balance, color correction, and tone mapping. Deep learning-based ISPs aim to transform RAW images into DSLR-like RGB images using deep neural networks. However, most learned ISPs are trained using patches (small regions) due to computational limitations. Such methods lack global context, which limits their efficacy on full-resolution images and harms their ability to capture global properties such as color constancy or illumination. First, we propose a novel module that can be integrated into any neural ISP to capture the global context information from the full RAW images. Second, we propose an efficient and simple neural ISP that utilizes our proposed module. Our model achieves state-of-the-art results on different benchmarks using diverse and real smartphone images.
