Table of Contents
Fetching ...

Generative Model-Assisted Demosaicing for Cross-multispectral Cameras

Jiahui Luo, Kai Feng, Haijin Zeng, Yongyong Chen

TL;DR

This work tackles cross-camera multispectral demosaicing where ground-truth labels are scarce by introducing GMAD, a three-stage hybrid supervised approach that leverages large-scale simulated data, Deep Image Prior-based pseudo-paired data, and targeted fine-tuning on pseudo labels. A frequency-domain hard patch selection module mitigates artifacts during fine-tuning, improving spectral fidelity and edge preservation. The authors also introduce UniSpecTest, a real-world multispectral mosaic dataset for robust benchmarking. Across real and synthetic datasets, GMAD demonstrates significant improvements over state-of-the-art methods and can match GT-based training performance in GT-free scenarios, illustrating strong cross-camera generalization and practical utility for snapshot MSI systems.

Abstract

As a crucial part of the spectral filter array (SFA)-based multispectral imaging process, spectral demosaicing has exploded with the proliferation of deep learning techniques. However, (1) bothering by the difficulty of capturing corresponding labels for real data or simulating the practical spectral imaging process, end-to-end networks trained in a supervised manner using simulated data often perform poorly on real data. (2) cross-camera spectral discrepancies make it difficult to apply pre-trained models to new cameras. (3) existing demosaicing networks are prone to introducing visual artifacts on hard cases due to the interpolation of unknown values. To address these issues, we propose a hybrid supervised training method with the assistance of the self-supervised generative model, which performs well on real data across different spectral cameras. Specifically, our approach consists of three steps: (1) Pre-Training step: training the end-to-end neural network on a large amount of simulated data; (2) Pseudo-Pairing step: generating pseudo-labels of real target data using the self-supervised generative model; (3) Fine-Tuning step: fine-tuning the pre-trained model on the pseudo data pairs obtained in (2). To alleviate artifacts, we propose a frequency-domain hard patch selection method that identifies artifact-prone regions by analyzing spectral discrepancies using Fourier transform and filtering techniques, allowing targeted fine-tuning to enhance demosaicing performance. Finally, we propose UniSpecTest, a real-world multispectral mosaic image dataset for testing. Ablation experiments have demonstrated the effectiveness of each training step, and extensive experiments on both synthetic and real datasets show that our method achieves significant performance gains compared to state-of-the-art techniques.

Generative Model-Assisted Demosaicing for Cross-multispectral Cameras

TL;DR

This work tackles cross-camera multispectral demosaicing where ground-truth labels are scarce by introducing GMAD, a three-stage hybrid supervised approach that leverages large-scale simulated data, Deep Image Prior-based pseudo-paired data, and targeted fine-tuning on pseudo labels. A frequency-domain hard patch selection module mitigates artifacts during fine-tuning, improving spectral fidelity and edge preservation. The authors also introduce UniSpecTest, a real-world multispectral mosaic dataset for robust benchmarking. Across real and synthetic datasets, GMAD demonstrates significant improvements over state-of-the-art methods and can match GT-based training performance in GT-free scenarios, illustrating strong cross-camera generalization and practical utility for snapshot MSI systems.

Abstract

As a crucial part of the spectral filter array (SFA)-based multispectral imaging process, spectral demosaicing has exploded with the proliferation of deep learning techniques. However, (1) bothering by the difficulty of capturing corresponding labels for real data or simulating the practical spectral imaging process, end-to-end networks trained in a supervised manner using simulated data often perform poorly on real data. (2) cross-camera spectral discrepancies make it difficult to apply pre-trained models to new cameras. (3) existing demosaicing networks are prone to introducing visual artifacts on hard cases due to the interpolation of unknown values. To address these issues, we propose a hybrid supervised training method with the assistance of the self-supervised generative model, which performs well on real data across different spectral cameras. Specifically, our approach consists of three steps: (1) Pre-Training step: training the end-to-end neural network on a large amount of simulated data; (2) Pseudo-Pairing step: generating pseudo-labels of real target data using the self-supervised generative model; (3) Fine-Tuning step: fine-tuning the pre-trained model on the pseudo data pairs obtained in (2). To alleviate artifacts, we propose a frequency-domain hard patch selection method that identifies artifact-prone regions by analyzing spectral discrepancies using Fourier transform and filtering techniques, allowing targeted fine-tuning to enhance demosaicing performance. Finally, we propose UniSpecTest, a real-world multispectral mosaic image dataset for testing. Ablation experiments have demonstrated the effectiveness of each training step, and extensive experiments on both synthetic and real datasets show that our method achieves significant performance gains compared to state-of-the-art techniques.

Paper Structure

This paper contains 24 sections, 20 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Comparison of training performance using GMAD-generated pseudo-labels versus GT labels across two backbone networks (MCAN+ and MambaIR). PSNR evaluation demonstrates that on the MCAN+ network, training with GMAD-generated pseudo-labels even outperforms GT-based training, thereby validating that the proposed GMAD method can achieve competitive performance when GT data is unavailable.
  • Figure 2: The imaging process of a snapshot spectral imaging system. Numerous factors in real-world environments influence the generation of a mosaic image.
  • Figure 3: Overview of our GMAD pipeline. Step 1 involves pre-training the network on abundant simulated data to learn key multispectral image features. Step 2 employs a generative model to generate a pseudo-demosaiced cube, generating a paired training dataset for the target domain that lacks GT. Step 3 consists of a fine-tuning step, where supervised fine-tuning on the pseudo-paired dataset enhances network performance in real applications. This method combines unsupervised and supervised learning to enable knowledge transfer and model generalization across different SFA cameras without requiring GT.
  • Figure 4: Comparison of GMAD (with MCAN+ as the backbone) with other SOTA demosaicing methods. (a) Three types of artifacts are typically produced during the demosaicing process, indicated by red, green, and yellow boxes, respectively. (b) Comparison of spectral curves at different points at the artifacts. It can be seen that the artifacts correspond to larger errors in the spectral curves, and the larger the jitter in the error curves, the larger the difference in the corresponding spectral features.
  • Figure 5: The process of generating a frequency variation map. It is traversed k times and processed step by step for each channel. Then the maximum value is taken for all pixel points, highlighting the frequency components with the largest variations. Finally, a bandpass filter is used to filter out non-artifactual parts of the frequency band.
  • ...and 5 more figures