Visible and Infrared Image Fusion Using Encoder-Decoder Network
Ferhat Can Ataman, Gözde Bozdaği Akar
TL;DR
The paper addresses infrared-visible image fusion by introducing a lightweight, end-to-end encoder–decoder network composed of dual encoders, a fusion module, and a decoder. Fusion is achieved through per-layer channel-wise aggregation using 1x1 convolutions, with skip connections in the decoder to reconstruct the fused image. A no-reference loss based on PaQ-derived quality metrics, supplemented by MSE terms, guides training, enabling high perceptual quality without ground-truth fused images. Empirical results show state-of-the-art perceptual quality with significantly faster inference (≈178 FPS), making the method suitable for real-time embedded vision tasks. The work demonstrates practical impact for applications like object detection and tracking in multimodal scenes while highlighting potential for deployment on edge devices and future dataset expansion.
Abstract
The aim of multispectral image fusion is to combine object or scene features of images with different spectral characteristics to increase the perceptual quality. In this paper, we present a novel learning-based solution to image fusion problem focusing on infrared and visible spectrum images. The proposed solution utilizes only convolution and pooling layers together with a loss function using no-reference quality metrics. The analysis is performed qualitatively and quantitatively on various datasets. The results show better performance than state-of-the-art methods. Also, the size of our network enables real-time performance on embedded devices. Project codes can be found at \url{https://github.com/ferhatcan/pyFusionSR}.
