Table of Contents
Fetching ...

Visible and infrared self-supervised fusion trained on a single example

Nati Ofir, Jean-Christophe Nebel

TL;DR

Experiments demonstrate that the proposed approach achieves similar or better qualitative and quantitative multispectral fusion results than other state-of-the-art methods that do not rely on heavy training and/or large datasets.

Abstract

Multispectral imaging is an important task of image processing and computer vision, which is especially relevant to applications such as dehazing or object detection. With the development of the RGBT (RGB & Thermal) sensor, the problem of visible (RGB) to Near Infrared (NIR) image fusion has become particularly timely. Indeed, while visible images see color, but suffer from noise, haze, and clouds, the NIR channel captures a clearer picture. The proposed approach fuses these two channels by training a Convolutional Neural Network by Self Supervised Learning (SSL) on a single example. For each such pair, RGB and NIR, the network is trained for seconds to deduce the final fusion. The SSL is based on the comparison of the Structure of Similarity and Edge-Preservation losses, where the labels for the SSL are the input channels themselves. This fusion preserves the relevant detail of each spectral channel without relying on a heavy training process. Experiments demonstrate that the proposed approach achieves similar or better qualitative and quantitative multispectral fusion results than other state-of-the-art methods that do not rely on heavy training and/or large datasets.

Visible and infrared self-supervised fusion trained on a single example

TL;DR

Experiments demonstrate that the proposed approach achieves similar or better qualitative and quantitative multispectral fusion results than other state-of-the-art methods that do not rely on heavy training and/or large datasets.

Abstract

Multispectral imaging is an important task of image processing and computer vision, which is especially relevant to applications such as dehazing or object detection. With the development of the RGBT (RGB & Thermal) sensor, the problem of visible (RGB) to Near Infrared (NIR) image fusion has become particularly timely. Indeed, while visible images see color, but suffer from noise, haze, and clouds, the NIR channel captures a clearer picture. The proposed approach fuses these two channels by training a Convolutional Neural Network by Self Supervised Learning (SSL) on a single example. For each such pair, RGB and NIR, the network is trained for seconds to deduce the final fusion. The SSL is based on the comparison of the Structure of Similarity and Edge-Preservation losses, where the labels for the SSL are the input channels themselves. This fusion preserves the relevant detail of each spectral channel without relying on a heavy training process. Experiments demonstrate that the proposed approach achieves similar or better qualitative and quantitative multispectral fusion results than other state-of-the-art methods that do not rely on heavy training and/or large datasets.
Paper Structure (9 sections, 5 figures, 5 tables, 1 algorithm)

This paper contains 9 sections, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Multispectral image fusion using RGB and NIR channels from the 'country' category from the VIS-NIR dataset BS11. Left: Input RGB channel. Middle: Fusion outcome. Right: Input NIR channel.
  • Figure 2: CNN architecture of the proposed method. The network inputs two channels and outputs a single channel of image fusion. Following image alignment using an STN, four convolutions with two skip connections are applied. In addition, a UNet-Resnet18 is trained in parallel to compute an accurate fusion map to enhance quality.
  • Figure 3: Compact CNN architecture used in the ablation study reported in Section \ref{['sec:results']}.
  • Figure 4: Outcomes of the proposed multispectral image fusion. From left to right: input RGB, fused, and input NIR images. From top to bottom: images from the 'Mountain', 'Country', 'Urban', and 'Street' categories of the VIS-NIR dataset.
  • Figure 5: Left: Superpixel image fusion based on classic computer vision ofir2023multispectral. Right: the proposed image fusion. From top to bottom: images from the 'Country', 'Mountain', 'Country', 'Urban', and 'Street' categories of the VIS-NIR dataset.