Table of Contents
Fetching ...

Task-Customized Mixture of Adapters for General Image Fusion

Pengfei Zhu, Yang Sun, Bing Cao, Qinghua Hu

TL;DR

The paper addresses the challenge of general image fusion across multiple tasks with distinct fusion mechanisms. It introduces Task-Customized Mixture of Adapters (TC-MoA), a MoE-inspired, parameter-efficient approach that inserts adapters and a task-specific router into a frozen ViT backbone to generate per-task prompts for multi-source fusion. Mutual information regularization and task-specific loss terms guide the adapters to retain complementary information while preserving task-specific structure, intensity, and gradient cues. Experiments on VIF, MEF, and MFF demonstrate competitive or state-of-the-art performance among general fusion methods and strong performance relative to task-specific baselines, along with controllability and zero-shot generalization. The approach achieves high efficiency (about 2.8% trainable parameters) and offers flexible prompt and router control, suggesting practical impact for unified, adaptable image fusion systems.

Abstract

General image fusion aims at integrating important information from multi-source images. However, due to the significant cross-task gap, the respective fusion mechanism varies considerably in practice, resulting in limited performance across subtasks. To handle this problem, we propose a novel task-customized mixture of adapters (TC-MoA) for general image fusion, adaptively prompting various fusion tasks in a unified model. We borrow the insight from the mixture of experts (MoE), taking the experts as efficient tuning adapters to prompt a pre-trained foundation model. These adapters are shared across different tasks and constrained by mutual information regularization, ensuring compatibility with different tasks while complementarity for multi-source images. The task-specific routing networks customize these adapters to extract task-specific information from different sources with dynamic dominant intensity, performing adaptive visual feature prompt fusion. Notably, our TC-MoA controls the dominant intensity bias for different fusion tasks, successfully unifying multiple fusion tasks in a single model. Extensive experiments show that TC-MoA outperforms the competing approaches in learning commonalities while retaining compatibility for general image fusion (multi-modal, multi-exposure, and multi-focus), and also demonstrating striking controllability on more generalization experiments. The code is available at https://github.com/YangSun22/TC-MoA .

Task-Customized Mixture of Adapters for General Image Fusion

TL;DR

The paper addresses the challenge of general image fusion across multiple tasks with distinct fusion mechanisms. It introduces Task-Customized Mixture of Adapters (TC-MoA), a MoE-inspired, parameter-efficient approach that inserts adapters and a task-specific router into a frozen ViT backbone to generate per-task prompts for multi-source fusion. Mutual information regularization and task-specific loss terms guide the adapters to retain complementary information while preserving task-specific structure, intensity, and gradient cues. Experiments on VIF, MEF, and MFF demonstrate competitive or state-of-the-art performance among general fusion methods and strong performance relative to task-specific baselines, along with controllability and zero-shot generalization. The approach achieves high efficiency (about 2.8% trainable parameters) and offers flexible prompt and router control, suggesting practical impact for unified, adaptable image fusion systems.

Abstract

General image fusion aims at integrating important information from multi-source images. However, due to the significant cross-task gap, the respective fusion mechanism varies considerably in practice, resulting in limited performance across subtasks. To handle this problem, we propose a novel task-customized mixture of adapters (TC-MoA) for general image fusion, adaptively prompting various fusion tasks in a unified model. We borrow the insight from the mixture of experts (MoE), taking the experts as efficient tuning adapters to prompt a pre-trained foundation model. These adapters are shared across different tasks and constrained by mutual information regularization, ensuring compatibility with different tasks while complementarity for multi-source images. The task-specific routing networks customize these adapters to extract task-specific information from different sources with dynamic dominant intensity, performing adaptive visual feature prompt fusion. Notably, our TC-MoA controls the dominant intensity bias for different fusion tasks, successfully unifying multiple fusion tasks in a single model. Extensive experiments show that TC-MoA outperforms the competing approaches in learning commonalities while retaining compatibility for general image fusion (multi-modal, multi-exposure, and multi-focus), and also demonstrating striking controllability on more generalization experiments. The code is available at https://github.com/YangSun22/TC-MoA .
Paper Structure (20 sections, 17 equations, 17 figures, 10 tables)

This paper contains 20 sections, 17 equations, 17 figures, 10 tables.

Figures (17)

  • Figure 1: Prompt can adaptively select the complementary information from multi-source features. The dominant intensity bias vary according to the task, which is reflected by the different shades of colors.
  • Figure 2: An overview of our proposed TC-MoA method. Our approach gradually modulates the fusion results by inserting TC-MoA into the frozen ViT backbone. TC-MoA generates task-specific prompt through a task-specific router bank and an shared adapter bank. The fusion layer utilizes prompt as scale and source-specific embeddings as biases to obtain fusion images.
  • Figure 3: Qualitative comparisons of various methods in VIF task.
  • Figure 4: Qualitative comparisons in MEF task.
  • Figure 5: Qualitative comparisons in MFF task.
  • ...and 12 more figures