Table of Contents
Fetching ...

Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion

Meng Zhou, Yuxuan Zhang, Xiaolan Xu, Jiayi Wang, Farzad Khalvati

TL;DR

This work proposes a novel CNN-based architecture that addresses limitations in fusion performance by introducing a Dilated Residual Attention Network Module for effective multiscale feature extraction, coupled with a gradient operator to enhance edge detail learning.

Abstract

Multimodal medical image fusion is a crucial task that combines complementary information from different imaging modalities into a unified representation, thereby enhancing diagnostic accuracy and treatment planning. While deep learning methods, particularly Convolutional Neural Networks (CNNs) and Transformers, have significantly advanced fusion performance, some of the existing CNN-based methods fall short in capturing fine-grained multiscale and edge features, leading to suboptimal feature integration. Transformer-based models, on the other hand, are computationally intensive in both the training and fusion stages, making them impractical for real-time clinical use. Moreover, the clinical application of fused images remains unexplored. In this paper, we propose a novel CNN-based architecture that addresses these limitations by introducing a Dilated Residual Attention Network Module for effective multiscale feature extraction, coupled with a gradient operator to enhance edge detail learning. To ensure fast and efficient fusion, we present a parameter-free fusion strategy based on the weighted nuclear norm of softmax, which requires no additional computations during training or inference. Extensive experiments, including a downstream brain tumor classification task, demonstrate that our approach outperforms various baseline methods in terms of visual quality, texture preservation, and fusion speed, making it a possible practical solution for real-world clinical applications. The code will be released at https://github.com/simonZhou86/en_dran.

Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion

TL;DR

This work proposes a novel CNN-based architecture that addresses limitations in fusion performance by introducing a Dilated Residual Attention Network Module for effective multiscale feature extraction, coupled with a gradient operator to enhance edge detail learning.

Abstract

Multimodal medical image fusion is a crucial task that combines complementary information from different imaging modalities into a unified representation, thereby enhancing diagnostic accuracy and treatment planning. While deep learning methods, particularly Convolutional Neural Networks (CNNs) and Transformers, have significantly advanced fusion performance, some of the existing CNN-based methods fall short in capturing fine-grained multiscale and edge features, leading to suboptimal feature integration. Transformer-based models, on the other hand, are computationally intensive in both the training and fusion stages, making them impractical for real-time clinical use. Moreover, the clinical application of fused images remains unexplored. In this paper, we propose a novel CNN-based architecture that addresses these limitations by introducing a Dilated Residual Attention Network Module for effective multiscale feature extraction, coupled with a gradient operator to enhance edge detail learning. To ensure fast and efficient fusion, we present a parameter-free fusion strategy based on the weighted nuclear norm of softmax, which requires no additional computations during training or inference. Extensive experiments, including a downstream brain tumor classification task, demonstrate that our approach outperforms various baseline methods in terms of visual quality, texture preservation, and fusion speed, making it a possible practical solution for real-world clinical applications. The code will be released at https://github.com/simonZhou86/en_dran.

Paper Structure

This paper contains 12 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: An overview of Stage 1 model in the proposed framework. DConv3x3, r=c represents dilated convolution with kernel size 3$\times$3 and dilation rate equals to c. All Conv+LReLU layers in the decoder have 3$\times$3 kernel followed by Leaky-ReLU. Note that the YCbCr conversion only applies to SPECT images.
  • Figure 2: A sample illustration of the fusion process, we take MRI-SPECT fusion as an example. The fusion process is the same for MRI-CT fusion except the RGB to YCbCr conversion is ignored.
  • Figure 3: Qualitative results for MRI-CT (top three rows) and MRI-SPECT (bottom three rows) fusion task. We randomly select three sample pairs from both test sets and show the fusion results across different methods. Zoom in for a better view.
  • Figure 4: Qualitative comparison between different fusion strategies on MRI-CT and SPECT dataset. Left panel: visualization on MRI-CT dataset, right panel: visualization on the MRI-SPECT dataset. For both datasets, we randomly sample a pair from the test set and zoom in on a selected region for a better view.