Table of Contents
Fetching ...

Disentangled Multimodal Brain MR Image Translation via Transformer-based Modality Infuser

Jihoon Cho, Xiaofeng Liu, Fangxu Xing, Jinsong Ouyang, Georges El Fakhri, Jinah Park, Jonghye Woo

TL;DR

This work addresses the challenge of synthesizing missing MR modalities by introducing a transformer-based modality infuser that converts modality-agnostic encoder features into modality-specific representations, enabling global self-attention over brain structures. The method disentangles modality-invariant and modality-specific features, uses modality encoding within a transformer, and optimizes with a combination of reconstruction, cycle-consistency, adversarial, and auxiliary modality losses. On BraTS 2018, it outperforms prior CNN-based approaches across multiple synthesis metrics and improves brain tumor segmentation when training with synthesized modalities. The approach offers a practical pathway to obtain complete multimodal information from limited scans, enhancing diagnostic workflows and data augmentation for medical imaging tasks.

Abstract

Multimodal Magnetic Resonance (MR) Imaging plays a crucial role in disease diagnosis due to its ability to provide complementary information by analyzing a relationship between multimodal images on the same subject. Acquiring all MR modalities, however, can be expensive, and, during a scanning session, certain MR images may be missed depending on the study protocol. The typical solution would be to synthesize the missing modalities from the acquired images such as using generative adversarial networks (GANs). Yet, GANs constructed with convolutional neural networks (CNNs) are likely to suffer from a lack of global relationships and mechanisms to condition the desired modality. To address this, in this work, we propose a transformer-based modality infuser designed to synthesize multimodal brain MR images. In our method, we extract modality-agnostic features from the encoder and then transform them into modality-specific features using the modality infuser. Furthermore, the modality infuser captures long-range relationships among all brain structures, leading to the generation of more realistic images. We carried out experiments on the BraTS 2018 dataset, translating between four MR modalities, and our experimental results demonstrate the superiority of our proposed method in terms of synthesis quality. In addition, we conducted experiments on a brain tumor segmentation task and different conditioning methods.

Disentangled Multimodal Brain MR Image Translation via Transformer-based Modality Infuser

TL;DR

This work addresses the challenge of synthesizing missing MR modalities by introducing a transformer-based modality infuser that converts modality-agnostic encoder features into modality-specific representations, enabling global self-attention over brain structures. The method disentangles modality-invariant and modality-specific features, uses modality encoding within a transformer, and optimizes with a combination of reconstruction, cycle-consistency, adversarial, and auxiliary modality losses. On BraTS 2018, it outperforms prior CNN-based approaches across multiple synthesis metrics and improves brain tumor segmentation when training with synthesized modalities. The approach offers a practical pathway to obtain complete multimodal information from limited scans, enhancing diagnostic workflows and data augmentation for medical imaging tasks.

Abstract

Multimodal Magnetic Resonance (MR) Imaging plays a crucial role in disease diagnosis due to its ability to provide complementary information by analyzing a relationship between multimodal images on the same subject. Acquiring all MR modalities, however, can be expensive, and, during a scanning session, certain MR images may be missed depending on the study protocol. The typical solution would be to synthesize the missing modalities from the acquired images such as using generative adversarial networks (GANs). Yet, GANs constructed with convolutional neural networks (CNNs) are likely to suffer from a lack of global relationships and mechanisms to condition the desired modality. To address this, in this work, we propose a transformer-based modality infuser designed to synthesize multimodal brain MR images. In our method, we extract modality-agnostic features from the encoder and then transform them into modality-specific features using the modality infuser. Furthermore, the modality infuser captures long-range relationships among all brain structures, leading to the generation of more realistic images. We carried out experiments on the BraTS 2018 dataset, translating between four MR modalities, and our experimental results demonstrate the superiority of our proposed method in terms of synthesis quality. In addition, we conducted experiments on a brain tumor segmentation task and different conditioning methods.
Paper Structure (7 sections, 3 figures, 3 tables)

This paper contains 7 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Illustration of our framework for translating MR images (left) and structure of the modality infuser (right). Our framework consists of CNN Encoder (Enc), transformer-based modality infuser (MI), CNN Decoder (Dec), and CNN discriminator (Dis).
  • Figure 2: Multimodal MR image synthesis results using our framework. The MR images of the first column are translated into the other modalities.
  • Figure 3: Feature visualization results: Brown color depicts modality-agnostic features extracted from the CNN encoder, while other colors represent conditioned features through the modality infuser (T1: blue, T2: green, T1ce: yellow, and FLAIR: purple.)