Table of Contents
Fetching ...

Missing No More: Dictionary-Guided Cross-Modal Image Fusion under Missing Infrared

Yafei Zhang, Meng Ma, Huafeng Li, Yu Liu

TL;DR

This work addresses missing-IR fusion by proposing a dictionary-guided, coefficient-domain framework built upon a shared convolutional dictionary, and represents the first framework that jointly learns a shared dictionary and performs coefficient-domain inference-fusion to tackle missing-IR fusion.

Abstract

Infrared-visible (IR-VIS) image fusion is vital for perception and security, yet most methods rely on the availability of both modalities during training and inference. When the infrared modality is absent, pixel-space generative substitutes become hard to control and inherently lack interpretability. We address missing-IR fusion by proposing a dictionary-guided, coefficient-domain framework built upon a shared convolutional dictionary. The pipeline comprises three key components: (1) Joint Shared-dictionary Representation Learning (JSRL) learns a unified and interpretable atom space shared by both IR and VIS modalities; (2) VIS-Guided IR Inference (VGII) transfers VIS coefficients to pseudo-IR coefficients in the coefficient domain and performs a one-step closed-loop refinement guided by a frozen large language model as a weak semantic prior; and (3) Adaptive Fusion via Representation Inference (AFRI) merges VIS structures and inferred IR cues at the atom level through window attention and convolutional mixing, followed by reconstruction with the shared dictionary. This encode-transfer-fuse-reconstruct pipeline avoids uncontrolled pixel-space generation while ensuring prior preservation within interpretable dictionary-coefficient representation. Experiments under missing-IR settings demonstrate consistent improvements in perceptual quality and downstream detection performance. To our knowledge, this represents the first framework that jointly learns a shared dictionary and performs coefficient-domain inference-fusion to tackle missing-IR fusion. The source code is publicly available at https://github.com/harukiv/DCMIF.

Missing No More: Dictionary-Guided Cross-Modal Image Fusion under Missing Infrared

TL;DR

This work addresses missing-IR fusion by proposing a dictionary-guided, coefficient-domain framework built upon a shared convolutional dictionary, and represents the first framework that jointly learns a shared dictionary and performs coefficient-domain inference-fusion to tackle missing-IR fusion.

Abstract

Infrared-visible (IR-VIS) image fusion is vital for perception and security, yet most methods rely on the availability of both modalities during training and inference. When the infrared modality is absent, pixel-space generative substitutes become hard to control and inherently lack interpretability. We address missing-IR fusion by proposing a dictionary-guided, coefficient-domain framework built upon a shared convolutional dictionary. The pipeline comprises three key components: (1) Joint Shared-dictionary Representation Learning (JSRL) learns a unified and interpretable atom space shared by both IR and VIS modalities; (2) VIS-Guided IR Inference (VGII) transfers VIS coefficients to pseudo-IR coefficients in the coefficient domain and performs a one-step closed-loop refinement guided by a frozen large language model as a weak semantic prior; and (3) Adaptive Fusion via Representation Inference (AFRI) merges VIS structures and inferred IR cues at the atom level through window attention and convolutional mixing, followed by reconstruction with the shared dictionary. This encode-transfer-fuse-reconstruct pipeline avoids uncontrolled pixel-space generation while ensuring prior preservation within interpretable dictionary-coefficient representation. Experiments under missing-IR settings demonstrate consistent improvements in perceptual quality and downstream detection performance. To our knowledge, this represents the first framework that jointly learns a shared dictionary and performs coefficient-domain inference-fusion to tackle missing-IR fusion. The source code is publicly available at https://github.com/harukiv/DCMIF.
Paper Structure (21 sections, 20 equations, 9 figures, 9 tables, 2 algorithms)

This paper contains 21 sections, 20 equations, 9 figures, 9 tables, 2 algorithms.

Figures (9)

  • Figure 1: Comparison between existing methods and ours. For infrared-missing cross-modal fusion, (a) existing methods adopt a multi-stage framework that generates infrared images before fusion, while (b) we propose a single-stage framework that directly infers infrared features from visible images without generating infrared images.
  • Figure 2: Architecture of the proposed framework. It integrates three modules (JSRL, VGII, and AFRI) to form a closed-loop pipeline of encoding, inference, fusion, and reconstruction. JSRL learns a shared dictionary for cross-modal alignment, VGII predicts latent infrared coefficients from visible ones, and AFRI fuses both modalities in the coefficient domain to reconstruct consistent, high-quality images under missing-IR conditions.
  • Figure 3: Structure of the IV-DLB.
  • Figure 4: Visual comparison results on the FLIR, MSRS, and KAIST datasets for fused image quality.
  • Figure 5: Qualitative comparison of object detection and semantic segmentation on the $\text{M}^3$FD and FMB datasets, respectively.
  • ...and 4 more figures