Table of Contents
Fetching ...

SEL-CIE: Knowledge-Guided Self-Supervised Learning Framework for CIE-XYZ Reconstruction from Non-Linear sRGB Images

Shir Barzel, Moshe Salhov, Ofir Lindenbaum, Amir Averbuch

TL;DR

The paper tackles reconstructing device-independent CIE-XYZ color images from non-linear sRGB inputs under limited paired data. It introduces SEL-CIE, a three-phase framework that combines supervised learning on paired sRGB2XYZ data, a color-board–guided self-supervised pre-task, and a final supervised refinement, facilitated by a dual-branch global/local network with a ResNet50 backbone. A trainable balancing parameter and a Delta E 76–based color-space loss enable effective SSL integration, yielding superior PSNR and SSIM on the sRGB2XYZ benchmark, especially when using the ResNet50 backbone (SEL-CIE-RB). These results demonstrate robust color-space reconstruction with potential impact on color-critical computer vision and medical imaging tasks requiring standardized color representations.

Abstract

Modern cameras typically offer two types of image states: a minimally processed linear raw RGB image representing the raw sensor data, and a highly-processed non-linear image state, such as the sRGB state. The CIE-XYZ color space is a device-independent linear space used as part of the camera pipeline and can be helpful for computer vision tasks, such as image deblurring, dehazing, and color recognition tasks in medical applications, where color accuracy is important. However, images are usually saved in non-linear states, and achieving CIE-XYZ color images using conventional methods is not always possible. To tackle this issue, classical methodologies have been developed that focus on reversing the acquisition pipeline. More recently, supervised learning has been employed, using paired CIE-XYZ and sRGB representations of identical images. However, obtaining a large-scale dataset of CIE-XYZ and sRGB pairs can be challenging. To overcome this limitation and mitigate the reliance on large amounts of paired data, self-supervised learning (SSL) can be utilized as a substitute for relying solely on paired data. This paper proposes a framework for using SSL methods alongside paired data to reconstruct CIE-XYZ images and re-render sRGB images, outperforming existing approaches. The proposed framework is applied to the sRGB2XYZ dataset.

SEL-CIE: Knowledge-Guided Self-Supervised Learning Framework for CIE-XYZ Reconstruction from Non-Linear sRGB Images

TL;DR

The paper tackles reconstructing device-independent CIE-XYZ color images from non-linear sRGB inputs under limited paired data. It introduces SEL-CIE, a three-phase framework that combines supervised learning on paired sRGB2XYZ data, a color-board–guided self-supervised pre-task, and a final supervised refinement, facilitated by a dual-branch global/local network with a ResNet50 backbone. A trainable balancing parameter and a Delta E 76–based color-space loss enable effective SSL integration, yielding superior PSNR and SSIM on the sRGB2XYZ benchmark, especially when using the ResNet50 backbone (SEL-CIE-RB). These results demonstrate robust color-space reconstruction with potential impact on color-critical computer vision and medical imaging tasks requiring standardized color representations.

Abstract

Modern cameras typically offer two types of image states: a minimally processed linear raw RGB image representing the raw sensor data, and a highly-processed non-linear image state, such as the sRGB state. The CIE-XYZ color space is a device-independent linear space used as part of the camera pipeline and can be helpful for computer vision tasks, such as image deblurring, dehazing, and color recognition tasks in medical applications, where color accuracy is important. However, images are usually saved in non-linear states, and achieving CIE-XYZ color images using conventional methods is not always possible. To tackle this issue, classical methodologies have been developed that focus on reversing the acquisition pipeline. More recently, supervised learning has been employed, using paired CIE-XYZ and sRGB representations of identical images. However, obtaining a large-scale dataset of CIE-XYZ and sRGB pairs can be challenging. To overcome this limitation and mitigate the reliance on large amounts of paired data, self-supervised learning (SSL) can be utilized as a substitute for relying solely on paired data. This paper proposes a framework for using SSL methods alongside paired data to reconstruct CIE-XYZ images and re-render sRGB images, outperforming existing approaches. The proposed framework is applied to the sRGB2XYZ dataset.
Paper Structure (10 sections, 6 equations, 4 figures, 2 tables)

This paper contains 10 sections, 6 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Visual comparisons for CIE-XYZ reconstruction and re-rendering. (A) The input sRGB image. (B) CIE-XYZ reconstruction using the proposed method. (C) Our re-rendered output was generated from the reconstructed CIE XYZ image. CIE XYZ images are scaled by a factor of two to aid visualization. The input images are sourced from the NUS dataset cheng2014illuminant.
  • Figure 2: The methodology employed for executing self-supervised training using the color boards allows the creation of a pre-task. This pre-task involves retaining and matching the colors of patches located on the color boards with corresponding colors in the CIE-XYZ color space. Sub-figure (\ref{['subfig:example_roi']}) displays an example image from the NUS dataset cheng2014illuminant, where the color board is detected based on the metadata. Additionally, (\ref{['subfig:additional_info']}) illustrates a visualization of the extra information in the metadata that can be used to sample each color board's color patches during the pre-task.
  • Figure 3: The training framework consists of three main phases: (1) Supervised training using the sRGB2XYZ dataset afifi2021cie to reconstruct CIE-XYZ images with the loss $L_{s_{srgb}}$ for minimizing sRGB image reconstruction error and $L_{s_{cie-xyz}}$ for minimizing CIE-XYZ image reconstruction error, (2) SSL using the NUS dataset cheng2014illuminant and color board patches with known CIE-XYZ colors to augment the network, with the loss $L_{ssl}$ for minimizing Delta E 76 discrepancy between reconstructed color board patches and their corresponding ground truth CIE-XYZ values, and (3) An additional phase of supervised training on the sRGB2XYZ dataset for further enhancement and incorporation of supervised data with the same loss functions $L_{s_{srgb}}$ and $L_{s_{cie-xyz}}$.
  • Figure 4: The CIE-XYZ image pipeline network from afifi2021cie with the ResNet pre-trained ImageNet backbone. The neural network architecture used in this study aims to emulate the camera imaging pipeline. It comprises two sub-networks that model both the global and local processing parts. The network's backbone, responsible for feature extraction and representation learning, is initialized using a pre-trained ResNet50 based on the architecture proposed in he2016deep. This ResNet50 backbone was initially trained on the ImageNet dataset russakovsky2015imagenet, a large-scale dataset with various annotated images. The pre-trained backbone is integrated into the network before the local processing convolutional neural network (CNN), allowing it to benefit from the learned features during pre-training. The pre-trained weights of the backbone are then fine-tuned during subsequent training phases, where the network learns to perform the specific task at hand, resulting in an efficient and effective image-processing pipeline.