Table of Contents
Fetching ...

Uni-ISP: Unifying the Learning of ISPs from Multiple Cameras

Lingen Li, Mingde Yao, Xingyu Meng, Muquan Yu, Tianfan Xue, Jinwei Gu

TL;DR

Uni-ISP tackles the scalability challenge of end-to-end learned ISPs by unifying the learning of inverse and forward ISP across multiple cameras through device-aware embeddings that interact with a shared encoder-decoder backbone via a Device-aware Embedding Interaction Module. The framework employs a dual training scheme (self-camera and cross-camera) and a frequency bias correction loss to handle alignment across synchronized cross-camera data, supported by the FiveCam dataset of 2,464 synchronized sRGB-RAW pairs from five smartphones. Empirical results show improvements of $+1.5$ dB in PSNR for inverse ISP and $+2.4$ dB in PSNR for forward ISP, along with capabilities in photographic appearance transfer, inter/extrapolation, and zero-shot image forensics, demonstrating practical utility for multi-device imaging pipelines. The work provides a scalable path toward versatile, cross-device image processing, enabling new applications and forensic analyses that leverage shared ISP behaviors across cameras.

Abstract

Modern end-to-end image signal processors (ISPs) can learn complex mappings from RAW/XYZ data to sRGB (or inverse), opening new possibilities in image processing. However, as the diversity of camera models continues to expand, developing and maintaining individual ISPs is not sustainable in the long term, which inherently lacks versatility, hindering the adaptability to multiple camera models. In this paper, we propose a novel pipeline, Uni-ISP, which unifies the learning of ISPs from multiple cameras, offering an accurate and versatile processor to multiple camera models. The core of Uni-ISP is leveraging device-aware embeddings through learning inverse/forward ISPs and its special training scheme. By doing so, Uni-ISP not only improves the performance of inverse/forward ISPs but also unlocks a variety of new applications inaccessible to existing learned ISPs. Moreover, since there is no dataset synchronously captured by multiple cameras for training, we construct a real-world 4K dataset, FiveCam, comprising more than 2,400 pairs of sRGB-RAW images synchronously captured by five smartphones. We conducted extensive experiments demonstrating Uni-ISP's accuracy in inverse/forward ISPs (with improvements of +1.5dB/2.4dB PSNR), its versatility in enabling new applications, and its adaptability to new camera models.

Uni-ISP: Unifying the Learning of ISPs from Multiple Cameras

TL;DR

Uni-ISP tackles the scalability challenge of end-to-end learned ISPs by unifying the learning of inverse and forward ISP across multiple cameras through device-aware embeddings that interact with a shared encoder-decoder backbone via a Device-aware Embedding Interaction Module. The framework employs a dual training scheme (self-camera and cross-camera) and a frequency bias correction loss to handle alignment across synchronized cross-camera data, supported by the FiveCam dataset of 2,464 synchronized sRGB-RAW pairs from five smartphones. Empirical results show improvements of dB in PSNR for inverse ISP and dB in PSNR for forward ISP, along with capabilities in photographic appearance transfer, inter/extrapolation, and zero-shot image forensics, demonstrating practical utility for multi-device imaging pipelines. The work provides a scalable path toward versatile, cross-device image processing, enabling new applications and forensic analyses that leverage shared ISP behaviors across cameras.

Abstract

Modern end-to-end image signal processors (ISPs) can learn complex mappings from RAW/XYZ data to sRGB (or inverse), opening new possibilities in image processing. However, as the diversity of camera models continues to expand, developing and maintaining individual ISPs is not sustainable in the long term, which inherently lacks versatility, hindering the adaptability to multiple camera models. In this paper, we propose a novel pipeline, Uni-ISP, which unifies the learning of ISPs from multiple cameras, offering an accurate and versatile processor to multiple camera models. The core of Uni-ISP is leveraging device-aware embeddings through learning inverse/forward ISPs and its special training scheme. By doing so, Uni-ISP not only improves the performance of inverse/forward ISPs but also unlocks a variety of new applications inaccessible to existing learned ISPs. Moreover, since there is no dataset synchronously captured by multiple cameras for training, we construct a real-world 4K dataset, FiveCam, comprising more than 2,400 pairs of sRGB-RAW images synchronously captured by five smartphones. We conducted extensive experiments demonstrating Uni-ISP's accuracy in inverse/forward ISPs (with improvements of +1.5dB/2.4dB PSNR), its versatility in enabling new applications, and its adaptability to new camera models.
Paper Structure (39 sections, 10 equations, 14 figures, 4 tables)

This paper contains 39 sections, 10 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: We propose Uni-ISP, a model that unifies the learning of inverse and forward ISP behaviors of multiple cameras simultaneously. By leveraging the shared characteristics across various camera ISPs, our method can achieve higher performance in inverse and forward ISP (A) compared to previously learned ISP methods tailored for only one camera separately. Meanwhile, the device-aware property of the Uni-ISP enables new cross-camera ISP applications for a learned ISP model, including photographic appearance transfer (B and C), inter/extrapolation (D), and zero-shot image forensics (E and F).
  • Figure 2: The model design of Uni-ISP. Uni-ISP contains two modules, the inverse ISP module $g$ and the forward ISP module $h$. Both two modules share the same structure. For visual simplicity, we draw the inverse ISP module $g$ as a thumbnail, whose inner structure is the same as the forward ISP module $h$. The device-aware embeddings are optimizable parameters and will be selected to interact with the bottleneck features via the DEIM during the training or inference.
  • Figure 3: The illustration of frequency bias in dataset wrapped using optical flow method. The interpolation during the wrapping will make images look blurry compared to the original one, eliminating its high-frequency component.
  • Figure 4: The preview of 3 scenes in our new dataset (left) and our capture devices (right). Each scene includes synchronized sRGB-Raw pairs of five smartphone cameras: Apple iPhone 14 Pro Max, Google Pixel 6 Pro, Huawei P40, Samsung Galaxy S20, and Xiaomi Mi 12. The raw images are visualized as XYZ images here, which can be converted back to raw without loss.
  • Figure 5: Two scenes that compare our Uni-ISP with other methods in the task of inverse ISP. The difference maps between the prediction of each model and the ground truth are shown in the second and fourth rows. For better visualization, the XYZ images are adjusted with a 50% increase in brightness to make the content easier to observe.
  • ...and 9 more figures