Unified Multi-Modal Image Synthesis for Missing Modality Imputation

Yue Zhang; Chengtao Peng; Qiuli Wang; Dan Song; Kaiyan Li; S. Kevin Zhou

Unified Multi-Modal Image Synthesis for Missing Modality Imputation

Yue Zhang, Chengtao Peng, Qiuli Wang, Dan Song, Kaiyan Li, S. Kevin Zhou

TL;DR

This work tackles missing modalities in multi-modal MRI by proposing a unified, single-model synthesis framework that imputes absent contrasts from arbitrary combinations of available inputs. It introduces a Commonality- and Discrepancy-Sensitive Encoder (CDS-Encoder) to exploit modality-invariant and modality-specific information, and a Dynamic Feature Unification Module (DFUM) to robustly fuse features from varying modality sets, using both hard and soft integration. A curriculum-learning strategy guides training across easy-to-hard missingness scenarios, and four discriminators enforce realistic, high-frequency detail through PatchGAN-based adversarial losses. Extensive experiments on BraTS and IXI demonstrate superior quantitative and qualitative performance across one-to-one, many-to-one, and unified synthesis tasks, along with downstream gains in tumor segmentation and cross-plane consistency. The results suggest a practical impact for missing-data imputation in clinical pipelines and potential extension to 3D volumetric synthesis, albeit with considerations for memory and cross-dataset generalizability.

Abstract

Multi-modal medical images provide complementary soft-tissue characteristics that aid in the screening and diagnosis of diseases. However, limited scanning time, image corruption and various imaging protocols often result in incomplete multi-modal images, thus limiting the usage of multi-modal data for clinical purposes. To address this issue, in this paper, we propose a novel unified multi-modal image synthesis method for missing modality imputation. Our method overall takes a generative adversarial architecture, which aims to synthesize missing modalities from any combination of available ones with a single model. To this end, we specifically design a Commonality- and Discrepancy-Sensitive Encoder for the generator to exploit both modality-invariant and specific information contained in input modalities. The incorporation of both types of information facilitates the generation of images with consistent anatomy and realistic details of the desired distribution. Besides, we propose a Dynamic Feature Unification Module to integrate information from a varying number of available modalities, which enables the network to be robust to random missing modalities. The module performs both hard integration and soft integration, ensuring the effectiveness of feature combination while avoiding information loss. Verified on two public multi-modal magnetic resonance datasets, the proposed method is effective in handling various synthesis tasks and shows superior performance compared to previous methods.

Unified Multi-Modal Image Synthesis for Missing Modality Imputation

TL;DR

Abstract

Paper Structure (32 sections, 13 equations, 12 figures, 9 tables)

This paper contains 32 sections, 13 equations, 12 figures, 9 tables.

Introduction
Related Works
Method
Unified Multi-Modal Synthesis Framework
Generator
Commonality- and Discrepancy-Sensitive Encoder
Dynamic Feature Unification Module
Decoder
Loss Function
Discriminator
Training Scheme
Materials and Experiments
Materials
BraTS Dataset
IXI Dataset
...and 17 more sections

Figures (12)

Figure 1: A schematic view of the proposed unified multi-modal image synthesis method.
Figure 2: Illustration of the detailed structures of the common encoding stream ($ES_C$), modality-specific encoding stream ($ES_i$), and modality-specific decoding streams ($DS_i$).
Figure 3: Illustration of the Dynamic Feature Unification Module (DFUM). (a) The scenario in which multiple modalities are available. (b) The scenario in which only a single modality is available. (c) The detailed structure of the attention block.
Figure 4: Visual examples of synthetic images produced by our method on the BraTS dataset. The four-bit digits represent the Availability Conditions of T1, T2, T1Gd, and FLAIR modalities, in which "0" represents the "missing" modality and "1" represents the "available" modality. Yellow boxes emphasize the obvious difference between images. The yellow decimals represent PSNR values.
Figure 5: Visual examples of synthetic images produced by our method on the IXI dataset. The three-bit digits represent the Availability Conditions of T1, T2, and PD modalities, in which "0" represents "missing" modality and "1" represents "available" modality. The yellow decimals represent PSNR values.
...and 7 more figures

Unified Multi-Modal Image Synthesis for Missing Modality Imputation

TL;DR

Abstract

Unified Multi-Modal Image Synthesis for Missing Modality Imputation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)