One Latent Space to Rule All Degradations: Unifying Restoration Knowledge for Image Fusion
Haolong Ma, Hui Li, Chunyang Cheng, Zeyang Zhang, Xiaoqing Luo, Xiaoning Song, Xiao-Jun Wu
TL;DR
The paper tackles degradations in multi-modal infrared–visible fusion, critiquing current All-in-One degradation-aware models for relying on synthetic data and entangled data-level degradations. It introduces LURE, a two-stage framework that first learns a Unified Latent Feature Space (ULFS) from high-quality restoration data and then learns fusion rules within that space, aided by a pseudo-degradation task to stabilize distribution alignment. A novel inner residual design and a Text-Guided Attention mechanism support robust feature learning and effective degradation-agnostic fusion, with losses including $\,\mathcal{L}_{unified}$ to align latent representations across degradations. Empirically, LURE achieves state-of-the-art results on vanilla and degradation-aware fusion benchmarks and improves downstream multi-modal semantic segmentation, while reducing reliance on synthetic degradation datasets. The approach offers a scalable, generalizable path for robust multi-modal fusion across diverse real-world degradations and can extend to other multi-modal fusion tasks.
Abstract
All-in-One Degradation-Aware Fusion Models (ADFMs) as one of multi-modal image fusion models, which aims to address complex scenes by mitigating degradations from source images and generating high-quality fused images. Mainstream ADFMs rely on end-to-end learning and heavily synthesized datasets to achieve degradation awareness and fusion. This rough learning strategy and non-real world scenario dataset dependence often limit their upper-bound performance, leading to low-quality results. To address these limitations, we present LURE, a Learning-driven Unified REpresentation model for infrared and visible image fusion, which is degradation-aware. LURE learns a Unified Latent Feature Space (ULFS) to avoid the dependency on complex data formats inherent in previous end-to-end learning pipelines. It further improves image fusion quality by leveraging the intrinsic relationships between multi-modalities. A novel loss function is also proposed to drive the learning of unified latent representations more stable.More importantly, LURE seamlessly incorporates existing high-quality real-world image restoration datasets. To further enhance the model's representation capability, we design a simple yet effective structure, termed internal residual block, to facilitate the learning of latent features. Experiments show our method outperforms state-of-the-art (SOTA) methods across general fusion, degradation-aware fusion, and downstream tasks. The code is available in the supplementary materials.
