Table of Contents
Fetching ...

Dig2DIG: Dig into Diffusion Information Gains for Image Fusion

Bing Cao, Baoshuo Cai, Changqing Zhang, Qinghua Hu

TL;DR

This work tackles the challenge of dynamically fusing information from multiple modalities in diffusion-based image fusion by revealing a spatio-temporal imbalance in denoising. It introduces diffusion information gains (DIG) to quantify each modality's contribution at every denoising step and develops Dig2DIG, a dynamic fusion framework that weight-adjusts guidance to minimize a formal generalization error upper bound $\mathrm{GError}(F) \le C - \sum_{t=1}^T \sum_{k=1}^K \mathrm{Cov}(w_k, B)$. The authors prove that aligning fusion weights with the corresponding guidance contributions reduces generalization error and demonstrate DIG-driven dynamic weighting via $w_k(t) = \frac{\exp(DIG_k(t))}{\sum_j \exp(DIG_j(t))}$. Empirically, Dig2DIG achieves superior fusion quality and efficiency across visible-infrared, multi-focus, and multi-exposure tasks without additional training, validating both the theoretical guarantees and practical benefits of dynamic diffusion-based fusion.

Abstract

Image fusion integrates complementary information from multi-source images to generate more informative results. Recently, the diffusion model, which demonstrates unprecedented generative potential, has been explored in image fusion. However, these approaches typically incorporate predefined multimodal guidance into diffusion, failing to capture the dynamically changing significance of each modality, while lacking theoretical guarantees. To address this issue, we reveal a significant spatio-temporal imbalance in image denoising; specifically, the diffusion model produces dynamic information gains in different image regions with denoising steps. Based on this observation, we Dig into the Diffusion Information Gains (Dig2DIG) and theoretically derive a diffusion-based dynamic image fusion framework that provably reduces the upper bound of the generalization error. Accordingly, we introduce diffusion information gains (DIG) to quantify the information contribution of each modality at different denoising steps, thereby providing dynamic guidance during the fusion process. Extensive experiments on multiple fusion scenarios confirm that our method outperforms existing diffusion-based approaches in terms of both fusion quality and inference efficiency.

Dig2DIG: Dig into Diffusion Information Gains for Image Fusion

TL;DR

This work tackles the challenge of dynamically fusing information from multiple modalities in diffusion-based image fusion by revealing a spatio-temporal imbalance in denoising. It introduces diffusion information gains (DIG) to quantify each modality's contribution at every denoising step and develops Dig2DIG, a dynamic fusion framework that weight-adjusts guidance to minimize a formal generalization error upper bound . The authors prove that aligning fusion weights with the corresponding guidance contributions reduces generalization error and demonstrate DIG-driven dynamic weighting via . Empirically, Dig2DIG achieves superior fusion quality and efficiency across visible-infrared, multi-focus, and multi-exposure tasks without additional training, validating both the theoretical guarantees and practical benefits of dynamic diffusion-based fusion.

Abstract

Image fusion integrates complementary information from multi-source images to generate more informative results. Recently, the diffusion model, which demonstrates unprecedented generative potential, has been explored in image fusion. However, these approaches typically incorporate predefined multimodal guidance into diffusion, failing to capture the dynamically changing significance of each modality, while lacking theoretical guarantees. To address this issue, we reveal a significant spatio-temporal imbalance in image denoising; specifically, the diffusion model produces dynamic information gains in different image regions with denoising steps. Based on this observation, we Dig into the Diffusion Information Gains (Dig2DIG) and theoretically derive a diffusion-based dynamic image fusion framework that provably reduces the upper bound of the generalization error. Accordingly, we introduce diffusion information gains (DIG) to quantify the information contribution of each modality at different denoising steps, thereby providing dynamic guidance during the fusion process. Extensive experiments on multiple fusion scenarios confirm that our method outperforms existing diffusion-based approaches in terms of both fusion quality and inference efficiency.

Paper Structure

This paper contains 18 sections, 1 theorem, 33 equations, 6 figures, 5 tables.

Key Result

Theorem 1

For a multi-source image fusion operator $F$ that employs diffusion-based conditional guidance, the Generalization Error (GError) can be decomposed as follows: (i) A linear combination of projection terms, where each term represents the projection of a single-modal conditional guidance onto the idea

Figures (6)

  • Figure 1: Dynamic guidance fusion vs. Fixed guidance fusion. The infrared modality, due to its pronounced structural cues, finishes most of its reconstruction earlier, whereas the visible modality, with its abundant texture information, continues to provide significant detail in the later denoising stages.
  • Figure 2: The framework of our Dig2DIG. Deriving from generalization theory, we find that the key to reducing the upper bound of fusion generalization error is ensuring that the projection of the guidance weight and guidance direction onto the ideal fusion direction is positively correlated. To achieve this, we utilize DIG to estimate this projection, providing theoretical guidance for reducing generalization error and effectively incorporating information during the fusion process.
  • Figure 3: Qualitative comparisons of our method and the competing approaches on M3FD Dataset.
  • Figure 4: Qualitative comparisons of our method and the competing approaches on MFFW Dataset and MEFB Dataset
  • Figure 5: The spatio-temporal imbalance of information gains.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Theorem 1