Uni-ISP: Unifying the Learning of ISPs from Multiple Cameras
Lingen Li, Mingde Yao, Xingyu Meng, Muquan Yu, Tianfan Xue, Jinwei Gu
TL;DR
Uni-ISP tackles the scalability challenge of end-to-end learned ISPs by unifying the learning of inverse and forward ISP across multiple cameras through device-aware embeddings that interact with a shared encoder-decoder backbone via a Device-aware Embedding Interaction Module. The framework employs a dual training scheme (self-camera and cross-camera) and a frequency bias correction loss to handle alignment across synchronized cross-camera data, supported by the FiveCam dataset of 2,464 synchronized sRGB-RAW pairs from five smartphones. Empirical results show improvements of $+1.5$ dB in PSNR for inverse ISP and $+2.4$ dB in PSNR for forward ISP, along with capabilities in photographic appearance transfer, inter/extrapolation, and zero-shot image forensics, demonstrating practical utility for multi-device imaging pipelines. The work provides a scalable path toward versatile, cross-device image processing, enabling new applications and forensic analyses that leverage shared ISP behaviors across cameras.
Abstract
Modern end-to-end image signal processors (ISPs) can learn complex mappings from RAW/XYZ data to sRGB (or inverse), opening new possibilities in image processing. However, as the diversity of camera models continues to expand, developing and maintaining individual ISPs is not sustainable in the long term, which inherently lacks versatility, hindering the adaptability to multiple camera models. In this paper, we propose a novel pipeline, Uni-ISP, which unifies the learning of ISPs from multiple cameras, offering an accurate and versatile processor to multiple camera models. The core of Uni-ISP is leveraging device-aware embeddings through learning inverse/forward ISPs and its special training scheme. By doing so, Uni-ISP not only improves the performance of inverse/forward ISPs but also unlocks a variety of new applications inaccessible to existing learned ISPs. Moreover, since there is no dataset synchronously captured by multiple cameras for training, we construct a real-world 4K dataset, FiveCam, comprising more than 2,400 pairs of sRGB-RAW images synchronously captured by five smartphones. We conducted extensive experiments demonstrating Uni-ISP's accuracy in inverse/forward ISPs (with improvements of +1.5dB/2.4dB PSNR), its versatility in enabling new applications, and its adaptability to new camera models.
