Table of Contents
Fetching ...

Learning Modality-Aware Representations: Adaptive Group-wise Interaction Network for Multimodal MRI Synthesis

Tao Song, Yicheng Wu, Minhao Hu, Xiangde Luo, Linda Wei, Guotai Wang, Yi Guo, Feng Xu, Shaoting Zhang

TL;DR

This work targets the challenge of synthesizing missing MRI modalities from multi-modal data under imperfect cross-modality alignment. It introduces AGI-Net, a plug-in convolutional framework featuring Cross Group Attention and Group-wise Rolling to model intra- and inter-modality relationships and adapt convolutional kernels per modality group. Through extensive experiments on IXI and BraTS2023, AGI-Net achieves state-of-the-art results across 2D and 3D multimodal synthesis tasks, demonstrates robustness to misalignment, and demonstrates improved brain-tumor segmentation when using synthesized modalities. The approach offers a scalable, efficient path to better multimodal MRI synthesis with practical implications for clinical workflows.

Abstract

Multimodal MR image synthesis aims to generate missing modality images by effectively fusing and mapping from a subset of available MRI modalities. Most existing methods adopt an image-to-image translation paradigm, treating multiple modalities as input channels. However, these approaches often yield sub-optimal results due to the inherent difficulty in achieving precise feature- or semantic-level alignment across modalities. To address these challenges, we propose an Adaptive Group-wise Interaction Network (AGI-Net) that explicitly models both inter-modality and intra-modality relationships for multimodal MR image synthesis. Specifically, feature channels are first partitioned into predefined groups, after which an adaptive rolling mechanism is applied to conventional convolutional kernels to better capture feature and semantic correspondences between different modalities. In parallel, a cross-group attention module is introduced to enable effective feature fusion across groups, thereby enhancing the network's representational capacity. We validate the proposed AGI-Net on the publicly available IXI and BraTS2023 datasets. Experimental results demonstrate that AGI-Net achieves state-of-the-art performance in multimodal MR image synthesis tasks, confirming the effectiveness of its modality-aware interaction design. We release the relevant code at: https://github.com/zunzhumu/Adaptive-Group-wise-Interaction-Network-for-Multimodal-MRI-Synthesis.git.

Learning Modality-Aware Representations: Adaptive Group-wise Interaction Network for Multimodal MRI Synthesis

TL;DR

This work targets the challenge of synthesizing missing MRI modalities from multi-modal data under imperfect cross-modality alignment. It introduces AGI-Net, a plug-in convolutional framework featuring Cross Group Attention and Group-wise Rolling to model intra- and inter-modality relationships and adapt convolutional kernels per modality group. Through extensive experiments on IXI and BraTS2023, AGI-Net achieves state-of-the-art results across 2D and 3D multimodal synthesis tasks, demonstrates robustness to misalignment, and demonstrates improved brain-tumor segmentation when using synthesized modalities. The approach offers a scalable, efficient path to better multimodal MRI synthesis with practical implications for clinical workflows.

Abstract

Multimodal MR image synthesis aims to generate missing modality images by effectively fusing and mapping from a subset of available MRI modalities. Most existing methods adopt an image-to-image translation paradigm, treating multiple modalities as input channels. However, these approaches often yield sub-optimal results due to the inherent difficulty in achieving precise feature- or semantic-level alignment across modalities. To address these challenges, we propose an Adaptive Group-wise Interaction Network (AGI-Net) that explicitly models both inter-modality and intra-modality relationships for multimodal MR image synthesis. Specifically, feature channels are first partitioned into predefined groups, after which an adaptive rolling mechanism is applied to conventional convolutional kernels to better capture feature and semantic correspondences between different modalities. In parallel, a cross-group attention module is introduced to enable effective feature fusion across groups, thereby enhancing the network's representational capacity. We validate the proposed AGI-Net on the publicly available IXI and BraTS2023 datasets. Experimental results demonstrate that AGI-Net achieves state-of-the-art performance in multimodal MR image synthesis tasks, confirming the effectiveness of its modality-aware interaction design. We release the relevant code at: https://github.com/zunzhumu/Adaptive-Group-wise-Interaction-Network-for-Multimodal-MRI-Synthesis.git.

Paper Structure

This paper contains 24 sections, 4 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Comparison between rolling convolution and standard convolution with a 3-channel image input. (a) Illustration of standard convolution, where the locations of parameters in each convolution kernel remain fixed across channels. (b) Illustration of rolling convolution, where the convolution weights shift in a data-dependent manner to capture feature and semantic variations across different groups.
  • Figure 2: Illustrating the rolling process of a convolutional kernel with a size of 3. Initially, the floating-point offsets of the kernel along the x-axis and y-axis are predicted. Subsequently, four sets of convolution kernels are generated through integer displacement operations. Finally, interpolation is employed to obtain the kernel weights corresponding to the floating-point displacements.
  • Figure 3: Random translation perturbation test result with the pixel2pixel framework for the (T1, T2)->PD scenario on the IXI dataset.dataset. Random transltion perturbation is applied to the pre-registered T2 modality images in the T1 and T2 pair.
  • Figure 4: An overview of our proposed CAGR module, which contains two components: Cross Group Attention and Group-wise Rolling. The Cross Group Attention module enhances the input features prior to the Group-wise Rolling module to reduce noise. Following this, the Group-wise Rolling module rolls the convolution kernels in a group-wise manner using the offsets learned from the enhanced input features.
  • Figure 5: Displays the (T1, T2)->PD synthesis results of pixel2pixel using the IXI dataset. The first row presents the ground truth along with the synthesis results from ResUnet and AGI-Net. The second row shows an enlarged view of the region of interest (ROI), while the third row illustrates the synthesis error map.
  • ...and 1 more figures