Table of Contents
Fetching ...

Coupled Degradation Modeling and Fusion: A VLM-Guided Degradation-Coupled Network for Degradation-Aware Infrared and Visible Image Fusion

Tianpei Zhang, Jufeng Zhao, Yiming Zhu, Guangmang Cui

TL;DR

This work tackles degraded infrared and visible image fusion by introducing VGDCFusion, a degradation aware framework that tightly couples degradation modeling with fusion using vision language model prompts. It comprises SPDCE, which enables intra modality degradation awareness and couples degradation suppression with feature extraction, and JPDCF, which facilitates cross modal degradation perception and integrates degradation filtering with cross modal feature fusion. The approach achieves superior performance under diverse degraded scenarios, validated by extensive experiments, ablations, and a downstream object detection task, demonstrating clear practical impact for robust IVIF in real world conditions. The authors also provide code to facilitate adoption and further research in degradation aware multi modal fusion.

Abstract

Existing Infrared and Visible Image Fusion (IVIF) methods typically assume high-quality inputs. However, when handing degraded images, these methods heavily rely on manually switching between different pre-processing techniques. This decoupling of degradation handling and image fusion leads to significant performance degradation. In this paper, we propose a novel VLM-Guided Degradation-Coupled Fusion network (VGDCFusion), which tightly couples degradation modeling with the fusion process and leverages vision-language models (VLMs) for degradation-aware perception and guided suppression. Specifically, the proposed Specific-Prompt Degradation-Coupled Extractor (SPDCE) enables modality-specific degradation awareness and establishes a joint modeling of degradation suppression and intra-modal feature extraction. In parallel, the Joint-Prompt Degradation-Coupled Fusion (JPDCF) facilitates cross-modal degradation perception and couples residual degradation filtering with complementary cross-modal feature fusion. Extensive experiments demonstrate that our VGDCFusion significantly outperforms existing state-of-the-art fusion approaches under various degraded image scenarios. Our code is available at https://github.com/Lmmh058/VGDCFusion.

Coupled Degradation Modeling and Fusion: A VLM-Guided Degradation-Coupled Network for Degradation-Aware Infrared and Visible Image Fusion

TL;DR

This work tackles degraded infrared and visible image fusion by introducing VGDCFusion, a degradation aware framework that tightly couples degradation modeling with fusion using vision language model prompts. It comprises SPDCE, which enables intra modality degradation awareness and couples degradation suppression with feature extraction, and JPDCF, which facilitates cross modal degradation perception and integrates degradation filtering with cross modal feature fusion. The approach achieves superior performance under diverse degraded scenarios, validated by extensive experiments, ablations, and a downstream object detection task, demonstrating clear practical impact for robust IVIF in real world conditions. The authors also provide code to facilitate adoption and further research in degradation aware multi modal fusion.

Abstract

Existing Infrared and Visible Image Fusion (IVIF) methods typically assume high-quality inputs. However, when handing degraded images, these methods heavily rely on manually switching between different pre-processing techniques. This decoupling of degradation handling and image fusion leads to significant performance degradation. In this paper, we propose a novel VLM-Guided Degradation-Coupled Fusion network (VGDCFusion), which tightly couples degradation modeling with the fusion process and leverages vision-language models (VLMs) for degradation-aware perception and guided suppression. Specifically, the proposed Specific-Prompt Degradation-Coupled Extractor (SPDCE) enables modality-specific degradation awareness and establishes a joint modeling of degradation suppression and intra-modal feature extraction. In parallel, the Joint-Prompt Degradation-Coupled Fusion (JPDCF) facilitates cross-modal degradation perception and couples residual degradation filtering with complementary cross-modal feature fusion. Extensive experiments demonstrate that our VGDCFusion significantly outperforms existing state-of-the-art fusion approaches under various degraded image scenarios. Our code is available at https://github.com/Lmmh058/VGDCFusion.

Paper Structure

This paper contains 26 sections, 11 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Comparison of fusion strategies under dual-modality degradation.
  • Figure 2: Architecture of VGDCFusion. The top‑right inset shows the prompt components (where "IVIF" denotes Infrared and Visible Image Fusion) and an example; the bottom‑right legend details each block in the diagram. Notably, the encoded prompt features are injected into every layer of both SPDCE and JPDCF modules to guide degradation‑aware fusion.
  • Figure 3: Network architecture of the SPDCE. "BC" denotes the abbreviation for the broadcasting operation.
  • Figure 4: Network architecture of the JPDCF. The computation of $W^{P}_{fu}$ and $B^{P}_{fu}$ is shown in the grey box (top right).
  • Figure 5: Qualitative comparison of VGDCFusion and seven comparative methods on MSRS, LLVIP, and M3FD datasets. “Prompt” denotes the fusion guidance in VGDCFusion. Key regions are highlighted with red/green boxes for visual clarity.
  • ...and 3 more figures