Table of Contents
Fetching ...

Dif-Fusion: Towards High Color Fidelity in Infrared and Visible Image Fusion with Diffusion Models

Jun Yue, Leyuan Fang, Shaobo Xia, Yue Deng, Jiayi Ma

TL;DR

A novel method with diffusion models, termed as Dif-Fusion, to generate the distribution of the multi-channel input data, which increases the ability of multi-source information aggregation and the fidelity of colors, and introduces Delta E as a new evaluation metric to quantify color fidelity.

Abstract

Color plays an important role in human visual perception, reflecting the spectrum of objects. However, the existing infrared and visible image fusion methods rarely explore how to handle multi-spectral/channel data directly and achieve high color fidelity. This paper addresses the above issue by proposing a novel method with diffusion models, termed as Dif-Fusion, to generate the distribution of the multi-channel input data, which increases the ability of multi-source information aggregation and the fidelity of colors. In specific, instead of converting multi-channel images into single-channel data in existing fusion methods, we create the multi-channel data distribution with a denoising network in a latent space with forward and reverse diffusion process. Then, we use the the denoising network to extract the multi-channel diffusion features with both visible and infrared information. Finally, we feed the multi-channel diffusion features to the multi-channel fusion module to directly generate the three-channel fused image. To retain the texture and intensity information, we propose multi-channel gradient loss and intensity loss. Along with the current evaluation metrics for measuring texture and intensity fidelity, we introduce a new evaluation metric to quantify color fidelity. Extensive experiments indicate that our method is more effective than other state-of-the-art image fusion methods, especially in color fidelity.

Dif-Fusion: Towards High Color Fidelity in Infrared and Visible Image Fusion with Diffusion Models

TL;DR

A novel method with diffusion models, termed as Dif-Fusion, to generate the distribution of the multi-channel input data, which increases the ability of multi-source information aggregation and the fidelity of colors, and introduces Delta E as a new evaluation metric to quantify color fidelity.

Abstract

Color plays an important role in human visual perception, reflecting the spectrum of objects. However, the existing infrared and visible image fusion methods rarely explore how to handle multi-spectral/channel data directly and achieve high color fidelity. This paper addresses the above issue by proposing a novel method with diffusion models, termed as Dif-Fusion, to generate the distribution of the multi-channel input data, which increases the ability of multi-source information aggregation and the fidelity of colors. In specific, instead of converting multi-channel images into single-channel data in existing fusion methods, we create the multi-channel data distribution with a denoising network in a latent space with forward and reverse diffusion process. Then, we use the the denoising network to extract the multi-channel diffusion features with both visible and infrared information. Finally, we feed the multi-channel diffusion features to the multi-channel fusion module to directly generate the three-channel fused image. To retain the texture and intensity information, we propose multi-channel gradient loss and intensity loss. Along with the current evaluation metrics for measuring texture and intensity fidelity, we introduce a new evaluation metric to quantify color fidelity. Extensive experiments indicate that our method is more effective than other state-of-the-art image fusion methods, especially in color fidelity.
Paper Structure (26 sections, 10 equations, 11 figures, 3 tables)

This paper contains 26 sections, 10 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Illustration of color fidelity. From (a) to (d): infrared image, visible image, fused images of U2Fusion 9151265 and our proposed Dif-Fusion. The dotted circles in red, yellow, and green show the color differences between the visible and the fused images of the wall, the pavement, and the vegetation, respectively. Compared with existing method, Dif-Fusion achieves higher color fidelity.
  • Figure 2: The overall framework of Dif-Fusion. $\bm{I_{0}}$ and $\bm{I_{t}}$ denote the multi-channel input and the multi-channel data in the forward diffusion process with $t$ timesteps. $P(\cdot|\cdot)$ and $Q(\cdot|\cdot)$ stand for the forward diffusion possess and reverse diffusion process. $\mathcal{L}_{MCI}$ and $\mathcal{L}_{MCG}$ represent multi-channel gradient loss and multi-channel intensity loss.
  • Figure 3: Visible and infrared image pairs generated from the diffusion models.
  • Figure 4: The structure of the denoising network and the multi-channel fusion module.
  • Figure 5: Qualitative comparison of Dif-Fusion with six state-of-the-art methods on the 00634D image pair from the MSRS dataset.
  • ...and 6 more figures