Table of Contents
Fetching ...

FusionBooster: A Unified Image Fusion Boosting Paradigm

Chunyang Cheng, Tianyang Xu, Xiao-Jun Wu, Hui Li, Xi Li, Josef Kittler

TL;DR

FusionBooster presents a universal, lightweight booster for image fusion that operates on any backbone by decomposing the initial fusion output with an information probe, reconstructing it via a nested autoencoder, and refining components through a booster layer before reassembling. The framework uses two losses, $Loss_{total} = Loss_{per} + Loss_{rec}$ with $Loss_{per} = Loss_{perA} + Loss_{perB}$, and trains in a two-stage process to preserve source information while improving output quality. Across IVIF, MFIF, and MEIF tasks—and with downstream pedestrian detection—the booster yields consistent gains on multiple metrics (e.g., $VIF$, $Q_{abf}$, $EN$, $EI$, $SD$) and maintains a small computational footprint, with public code available. These results demonstrate the booster’s generalization and practical impact for robust fusion in challenging imaging conditions.

Abstract

In recent years, numerous ideas have emerged for designing a mutually reinforcing mechanism or extra stages for the image fusion task, ignoring the inevitable gaps between different vision tasks and the computational burden. We argue that there is a scope to improve the fusion performance with the help of the FusionBooster, a model specifically designed for the fusion task. In particular, our booster is based on the divide-and-conquer strategy controlled by an information probe. The booster is composed of three building blocks: the probe units, the booster layer, and the assembling module. Given the result produced by a backbone method, the probe units assess the fused image and divide the results according to their information content. This is instrumental in identifying missing information, as a step to its recovery. The recovery of the degraded components along with the fusion guidance are the role of the booster layer. Lastly, the assembling module is responsible for piecing these advanced components together to deliver the output. We use concise reconstruction loss functions in conjunction with lightweight autoencoder models to formulate the learning task, with marginal computational complexity increase. The experimental results obtained in various fusion tasks, as well as downstream detection tasks, consistently demonstrate that the proposed FusionBooster significantly improves the performance. Our code will be publicly available at https://github.com/AWCXV/FusionBooster.

FusionBooster: A Unified Image Fusion Boosting Paradigm

TL;DR

FusionBooster presents a universal, lightweight booster for image fusion that operates on any backbone by decomposing the initial fusion output with an information probe, reconstructing it via a nested autoencoder, and refining components through a booster layer before reassembling. The framework uses two losses, with , and trains in a two-stage process to preserve source information while improving output quality. Across IVIF, MFIF, and MEIF tasks—and with downstream pedestrian detection—the booster yields consistent gains on multiple metrics (e.g., , , , , ) and maintains a small computational footprint, with public code available. These results demonstrate the booster’s generalization and practical impact for robust fusion in challenging imaging conditions.

Abstract

In recent years, numerous ideas have emerged for designing a mutually reinforcing mechanism or extra stages for the image fusion task, ignoring the inevitable gaps between different vision tasks and the computational burden. We argue that there is a scope to improve the fusion performance with the help of the FusionBooster, a model specifically designed for the fusion task. In particular, our booster is based on the divide-and-conquer strategy controlled by an information probe. The booster is composed of three building blocks: the probe units, the booster layer, and the assembling module. Given the result produced by a backbone method, the probe units assess the fused image and divide the results according to their information content. This is instrumental in identifying missing information, as a step to its recovery. The recovery of the degraded components along with the fusion guidance are the role of the booster layer. Lastly, the assembling module is responsible for piecing these advanced components together to deliver the output. We use concise reconstruction loss functions in conjunction with lightweight autoencoder models to formulate the learning task, with marginal computational complexity increase. The experimental results obtained in various fusion tasks, as well as downstream detection tasks, consistently demonstrate that the proposed FusionBooster significantly improves the performance. Our code will be publicly available at https://github.com/AWCXV/FusionBooster.
Paper Structure (29 sections, 12 equations, 21 figures, 7 tables)

This paper contains 29 sections, 12 equations, 21 figures, 7 tables.

Figures (21)

  • Figure 1: Comparison of the proposed FusionBooster and other advanced methods that contain additional enhancement models. The current algorithms are suffering from the issues of expensive computational cost, task gap and the lack of generalization ability. (Backbone method: DDcGAN ma2020ddcgan)
  • Figure 2: A comparison of the proposed divide and conquer boosting paradigm (b) and existing methods (a) relying on the booster (other vision models). The disentangled components allow us to better improve the fusion results in a fine-grained manner, which also provides us with the flexibility to handle more tasks, depending on the content.
  • Figure 3: A comparison of different learning-based image fusion methods. The AE-based method, DenseFuse li2018densefuse suffers from a bias issue, by biasing toward the infrared modality, which leads to the information loss in the fusion result (yellow boxes). However, as denoted by the red boxes, the AE-based method can produce more visually pleasing fused images, compared with the other two paradigms (DDcGAN and MUFusion cheng2023mufusion).
  • Figure 4: The pipeline of the proposed FusionBooster for the MEIF task (Backbone: U2Fusion). Our booster is composed of three parts, i.e., the information probe, the booster layer, and the assembling (ASE) module. The information probe first perceives the source components $I_\textrm{partA}$ and $I_\textrm{partB}$ in the initial result. The ASE module will piece these components together to rebuild the initial result. In the test phase, the degraded components are fine-tuned in the booster layer and the ASE module correspondingly yields the enhanced result.
  • Figure 5: The architecture of the proposed nested AE network. The core of this model is composed of the probe units and the assembling module. We use the AE-based architecture to formulate these components. On the other hand, from an overall (external) perspective, our model can be regarded as an AE to reconstruct the initial fusion result $F_{\mathrm{init}}$. The encoder and decoder modules of our network consist of several convolutional layers.
  • ...and 16 more figures