Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment
Shiyun Chen, Li Lin, Pujin Cheng, ZhiCheng Jin, JianJian Chen, HaiDong Zhu, Kenneth K. Y. Wong, Xiaoying Tang
TL;DR
Diff4MMLiTS addresses the challenge of unregistered multimodal liver CT data by introducing a four-stage pipeline that first generates normal CTs via inpainting, then synthesizes strictly aligned multimodal CTs with tumors using latent diffusion models, and finally trains a multimodal segmenter on a hybrid real-synthetic dataset. The approach eliminates reliance on perfectly aligned data and demonstrates superior performance on mmLiTs and generalization to LiTS, with notable gains across backbones and data efficiency. Key contributions include the Normal CT Generator, the Latent Diffusion–based Multimodal CT Synthesizer, and a hybrid training regimen that leverages synthetic data to enhance segmentation accuracy. This work has practical impact by enabling robust liver tumor segmentation in real-world clinical scenarios where multimodal alignment is imperfect, potentially improving diagnostic precision and treatment planning.
Abstract
Multimodal learning has been demonstrated to enhance performance across various clinical tasks, owing to the diverse perspectives offered by different modalities of data. However, existing multimodal segmentation methods rely on well-registered multimodal data, which is unrealistic for real-world clinical images, particularly for indistinct and diffuse regions such as liver tumors. In this paper, we introduce Diff4MMLiTS, a four-stage multimodal liver tumor segmentation pipeline: pre-registration of the target organs in multimodal CTs; dilation of the annotated modality's mask and followed by its use in inpainting to obtain multimodal normal CTs without tumors; synthesis of strictly aligned multimodal CTs with tumors using the latent diffusion model based on multimodal CT features and randomly generated tumor masks; and finally, training the segmentation model, thus eliminating the need for strictly aligned multimodal data. Extensive experiments on public and internal datasets demonstrate the superiority of Diff4MMLiTS over other state-of-the-art multimodal segmentation methods.
