Table of Contents
Fetching ...

Learning to Recover Spectral Reflectance from RGB Images

Dong Huo, Jian Wang, Yiming Qian, Yee-Hong Yang

TL;DR

This work tackles spectral reflectance recovery from RGB images under known illuminations by introducing a MAXL-based self-supervised test-time adaptation in tandem with a physics-informed architecture that accounts for the camera spectral sensitivity subspace and multi-illumination. The method jointly learns a primary SRR task and a self-supervised RGB reconstruction auxiliary task, enabling rapid adaptation with minimal labeled data at test time. Empirical results on synthetic and real datasets show substantial improvements over state-of-the-art methods, especially in real-world scenarios and with multiple illuminations, while ablations validate the contribution of each architectural and learning component. The approach promises practical SRR with unknown CSS in dynamic imaging settings and lays groundwork for robust spectral analysis from standard RGB captures.

Abstract

This paper tackles spectral reflectance recovery (SRR) from RGB images. Since capturing ground-truth spectral reflectance and camera spectral sensitivity are challenging and costly, most existing approaches are trained on synthetic images and utilize the same parameters for all unseen testing images, which are suboptimal especially when the trained models are tested on real images because they never exploit the internal information of the testing images. To address this issue, we adopt a self-supervised meta-auxiliary learning (MAXL) strategy that fine-tunes the well-trained network parameters with each testing image to combine external with internal information. To the best of our knowledge, this is the first work that successfully adapts the MAXL strategy to this problem. Instead of relying on naive end-to-end training, we also propose a novel architecture that integrates the physical relationship between the spectral reflectance and the corresponding RGB images into the network based on our mathematical analysis. Besides, since the spectral reflectance of a scene is independent to its illumination while the corresponding RGB images are not, we recover the spectral reflectance of a scene from its RGB images captured under multiple illuminations to further reduce the unknown. Qualitative and quantitative evaluations demonstrate the effectiveness of our proposed network and of the MAXL. Our code and data are available at https://github.com/Dong-Huo/SRR-MAXL.

Learning to Recover Spectral Reflectance from RGB Images

TL;DR

This work tackles spectral reflectance recovery from RGB images under known illuminations by introducing a MAXL-based self-supervised test-time adaptation in tandem with a physics-informed architecture that accounts for the camera spectral sensitivity subspace and multi-illumination. The method jointly learns a primary SRR task and a self-supervised RGB reconstruction auxiliary task, enabling rapid adaptation with minimal labeled data at test time. Empirical results on synthetic and real datasets show substantial improvements over state-of-the-art methods, especially in real-world scenarios and with multiple illuminations, while ablations validate the contribution of each architectural and learning component. The approach promises practical SRR with unknown CSS in dynamic imaging settings and lays groundwork for robust spectral analysis from standard RGB captures.

Abstract

This paper tackles spectral reflectance recovery (SRR) from RGB images. Since capturing ground-truth spectral reflectance and camera spectral sensitivity are challenging and costly, most existing approaches are trained on synthetic images and utilize the same parameters for all unseen testing images, which are suboptimal especially when the trained models are tested on real images because they never exploit the internal information of the testing images. To address this issue, we adopt a self-supervised meta-auxiliary learning (MAXL) strategy that fine-tunes the well-trained network parameters with each testing image to combine external with internal information. To the best of our knowledge, this is the first work that successfully adapts the MAXL strategy to this problem. Instead of relying on naive end-to-end training, we also propose a novel architecture that integrates the physical relationship between the spectral reflectance and the corresponding RGB images into the network based on our mathematical analysis. Besides, since the spectral reflectance of a scene is independent to its illumination while the corresponding RGB images are not, we recover the spectral reflectance of a scene from its RGB images captured under multiple illuminations to further reduce the unknown. Qualitative and quantitative evaluations demonstrate the effectiveness of our proposed network and of the MAXL. Our code and data are available at https://github.com/Dong-Huo/SRR-MAXL.
Paper Structure (18 sections, 1 theorem, 16 equations, 8 figures, 6 tables, 2 algorithms)

This paper contains 18 sections, 1 theorem, 16 equations, 8 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

All possible solutions of $\hat{\mathbf{R}}$ share the same $\hat{\omega} \hat{\mathbf{R}}_{\hat{\boldsymbol{\mathcal{H}}}}$ component.

Figures (8)

  • Figure 1: This paper proposes a novel spectral reflectance recovery approach from RGB images, which utilizes meta-auxiliary learning (MAXL) to exploit the internal information from testing images. It also demonstrates that an extra illumination (amber LED) can benefit the performance compared with a single illumination (white LED). The illumination LEDs could come from the True Tone flash pcmag_truetone of smartphones (first row). The last row shows the error maps of the recovered results, where Ours and Ours$\dagger$ represent the model w/o and w/ the extra illumination, respectively.
  • Figure 2: The left and right figures show a spectral reflectance curve and the illumination spectrum of a white LED, respectively. We can see that discretization loses high-frequency information.
  • Figure 3: Our proposed network architecture for SRR and meta-auxiliary learning. $e^i$ and $d^i$ denote the feature map from the encoder and the decoder, respectively, of scale $i$ ($i\in \{1, 2, 3, 4\}$), $\hat{\mathbf{R}}^i$ is the recovered reflectance of scale $i$ and $\hat{\mathbf{R}}^1$ represents the final recovered result $\hat{\mathbf{R}}$. The RGB image stack $\boldsymbol{\mathcal{I}}$ is downsampled to the corresponding scale before calculating $\hat{\mathbf{R}}_{\hat{\boldsymbol{\mathcal{H}}}}^i$. $\theta_{Pri}$ and $\theta_{Aux}$ denote the task-specific parameters for the primary task and the auxiliary task, respectively, and $\theta_S$ denotes the shared parameters. Our network consists of an encoder network to estimate the CSS, an encoder-decoder architecture for SRR, four spectral-attention layers to extract spectral correlation, output modules to generate $\hat{\mathbf{R}}^i$, and feature-guided upsampling modules (FUSEs) to upsample $\hat{\mathbf{R}}^i$ with the guidance of $e^{i - 1}$. The global average pooling before $\hat{\mathbf{S}}$ is omitted to simplify the illustration.
  • Figure 4: Qualitative comparison of error maps (MAE between the recovered results and the ground truth) with state-of-the-art approaches. The first four columns are from the synthetic data and last three columns are from our collected real data.
  • Figure 5: Qualitative comparison of error maps (MAE between the recovered results and the ground truth) of our method with/without MAXL for $M = 1$ and $M = 2$ on real data.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Proof 1