Learning Degradation-Independent Representations for Camera ISP Pipelines
Yanhui Guo, Fangzhou Luo, Xiaolin Wu
TL;DR
This work tackles degradations in camera ISP pipelines by learning degradation-independent representations (DiR) that generalize to unseen degradations. It introduces DiRNet to extract a shared degradation-free latent via multi-view mutual information maximization and learns a degradation-free reference (DfR) from high-quality images. An alignment network refines the baseline DiR $r^{(0)}$ to a task-ready $r^{+}$ guided by a pilot representation $r^{\rightarrow}$ derived from degraded inputs, enabling joint optimization with downstream tasks. The approach demonstrates strong generalization and improved performance in image restoration, object detection, and instance segmentation across synthetic and real-world ISP degradations, highlighting practical impact for robust machine perception in real cameras.
Abstract
Image signal processing (ISP) pipeline plays a fundamental role in digital cameras, which converts raw Bayer sensor data to RGB images. However, ISP-generated images usually suffer from imperfections due to the compounded degradations that stem from sensor noises, demosaicing noises, compression artifacts, and possibly adverse effects of erroneous ISP hyperparameter settings such as ISO and gamma values. In a general sense, these ISP imperfections can be considered as degradations. The highly complex mechanisms of ISP degradations, some of which are even unknown, pose great challenges to the generalization capability of deep neural networks (DNN) for image restoration and to their adaptability to downstream tasks. To tackle the issues, we propose a novel DNN approach to learn degradation-independent representations (DiR) through the refinement of a self-supervised learned baseline representation. The proposed DiR learning technique has remarkable domain generalization capability and consequently, it outperforms state-of-the-art methods across various downstream tasks, including blind image restoration, object detection, and instance segmentation, as verified in our experiments.
