Table of Contents
Fetching ...

Modular Neural Image Signal Processing

Mahmoud Afifi, Zhongling Wang, Ran Zhang, Michael S. Brown

TL;DR

This work introduces a fine-grained modular neural ISP that replaces monolithic end-to-end mappings with interpretable, independently trainable stages spanning raw enhancement, color correction, photofinishing (gain, GTM, LTM, chroma, gamma), guided upsampling, and detail enhancement. The framework enables camera-agnostic rendering, multiple picture styles, and an interactive photo-editing tool, including the ability to embed raw data into final JPEGs for unlimited post-editable re-rendering. Denoising uses pseudo ground-truth for training, while color correction and photofinishing rely on lightweight networks to predict stage-specific parameters, with a comprehensive loss balancing fidelity and perceptual quality. Experiments on the S24 dataset demonstrate state-of-the-art results with moderate parameter counts, and cross-camera generalization is supported by generic denoisers and cross-camera AWB models, complemented by a user study showing strong perceptual preferences. The approach also discusses practical considerations such as artifact mitigation, data misalignment challenges, and the ability to process sRGB inputs via linearization, underscoring the method’s practical impact for scalable, editable mobile imaging pipelines.

Abstract

This paper presents a modular neural image signal processing (ISP) framework that processes raw inputs and renders high-quality display-referred images. Unlike prior neural ISP designs, our method introduces a high degree of modularity, providing full control over multiple intermediate stages of the rendering process.~This modular design not only achieves high rendering accuracy but also improves scalability, debuggability, generalization to unseen cameras, and flexibility to match different user-preference styles. To demonstrate the advantages of this design, we built a user-interactive photo-editing tool that leverages our neural ISP to support diverse editing operations and picture styles. The tool is carefully engineered to take advantage of the high-quality rendering of our neural ISP and to enable unlimited post-editable re-rendering. Our method is a fully learning-based framework with variants of different capacities, all of moderate size (ranging from ~0.5 M to ~3.9 M parameters for the entire pipeline), and consistently delivers competitive qualitative and quantitative results across multiple test sets. Watch the supplemental video at: https://youtu.be/ByhQjQSjxVM

Modular Neural Image Signal Processing

TL;DR

This work introduces a fine-grained modular neural ISP that replaces monolithic end-to-end mappings with interpretable, independently trainable stages spanning raw enhancement, color correction, photofinishing (gain, GTM, LTM, chroma, gamma), guided upsampling, and detail enhancement. The framework enables camera-agnostic rendering, multiple picture styles, and an interactive photo-editing tool, including the ability to embed raw data into final JPEGs for unlimited post-editable re-rendering. Denoising uses pseudo ground-truth for training, while color correction and photofinishing rely on lightweight networks to predict stage-specific parameters, with a comprehensive loss balancing fidelity and perceptual quality. Experiments on the S24 dataset demonstrate state-of-the-art results with moderate parameter counts, and cross-camera generalization is supported by generic denoisers and cross-camera AWB models, complemented by a user study showing strong perceptual preferences. The approach also discusses practical considerations such as artifact mitigation, data misalignment challenges, and the ability to process sRGB inputs via linearization, underscoring the method’s practical impact for scalable, editable mobile imaging pipelines.

Abstract

This paper presents a modular neural image signal processing (ISP) framework that processes raw inputs and renders high-quality display-referred images. Unlike prior neural ISP designs, our method introduces a high degree of modularity, providing full control over multiple intermediate stages of the rendering process.~This modular design not only achieves high rendering accuracy but also improves scalability, debuggability, generalization to unseen cameras, and flexibility to match different user-preference styles. To demonstrate the advantages of this design, we built a user-interactive photo-editing tool that leverages our neural ISP to support diverse editing operations and picture styles. The tool is carefully engineered to take advantage of the high-quality rendering of our neural ISP and to enable unlimited post-editable re-rendering. Our method is a fully learning-based framework with variants of different capacities, all of moderate size (ranging from ~0.5 M to ~3.9 M parameters for the entire pipeline), and consistently delivers competitive qualitative and quantitative results across multiple test sets. Watch the supplemental video at: https://youtu.be/ByhQjQSjxVM

Paper Structure

This paper contains 85 sections, 57 equations, 49 figures, 32 tables, 1 algorithm.

Figures (49)

  • Figure 1: We present a modular neural image signal processing (ISP) framework that offers full control over every stage of the pipeline and can handle unseen cameras without requiring re-training. On top of this framework, we built a user-interactive tool that supports post-editable re-rendering, allowing users to re-process saved outputs with different picture styles and manual adjustments. The shown image was captured in raw format using the iPhone 13 main camera, then denoised and processed by our modular ISP, with intermediate stages and multiple picture-style and manual-adjustment results displayed. None of our models were trained on data from iPhone cameras.
  • Figure 2: Overview of our modular framework. The pipeline begins with image denoising, followed by color correction to map the denoised raw image to the linear sRGB space. The photofinishing module then processes a downsampled version of the linear sRGB image through five parametric stages, where neural networks predict image-based parameters for each stage: digital gain map, global tone mapping (GTM), local tone mapping (LTM), chroma mapping, and gamma correction. A guided upsampling step, using the full-resolution linear sRGB image as guidance, reconstructs the full-resolution photofinishing output, which is then refined by a detail-enhancement stage to produce the final image. The shown example is from the S24 dataset s24.
  • Figure 3: User-interactive photo-editing tool built on our modular ISP, providing full control over the rendering process, picture styles, and editing options. The interface supports selecting or interpolating between styles and adjusting white balance, exposure, color, and overall appearance. See the supplementary material (Sec. \ref{['sec:gui']}) and the https://youtu.be/ByhQjQSjxVM for details.
  • Figure 4: Qualitative comparison between our method and recent neural ISP methods (ISPDiffuser ispdiffuser, LiteISP lite-isp, and ParamISP paramisp) on an example from the S24 test set s24. Results are shown for the default picture style (Style #0) and the remaining artistic styles (Styles #1–5). PSNR values with respect to the ground truth are shown in the lower-left corner of each image.
  • Figure 5: Comparison among Project Indigo adobe_indigo_2025, the iPhone native camera ISP, and our method (using the generic denoiser and cross-camera auto white balance). The image was captured using the iPhone 13 Pro Max main camera.
  • ...and 44 more figures