TMI-CLNet: Triple-Modal Interaction Network for Chronic Liver Disease Prognosis From Imaging, Clinical, and Radiomic Data Fusion
Linglong Wu, Xuhao Shan, Ruiquan Ge, Ruoyu Liang, Chi Zhang, Yonghong Li, Ahmed Elazab, Huoling Luo, Yunbi Liu, Changmiao Wang
TL;DR
Chronic liver disease prognosis benefits from multimodal data, but integrating CT imaging, radiomic features, and clinical information is challenging due to modality heterogeneity. The authors propose TMI-CLNet, a triple-modal network featuring Intra-Modal Aggregation (IMA) to refine each modality, Triple-Modal Cross-Attention Fusion (TCAF) to extract cross-modal interactions, and a Triple-Modal Feature Fusion (TMFF) loss to align representations across modalities. On a private cohort of 184 patients, TMI-CLNet with 5-fold cross-validation outperforms unimodal and other multimodal baselines, with accuracy 83.12% and AUC 0.8223, and ablations confirm the contributions of each module. The approach demonstrates strong potential for improved, data-driven prognosis in chronic liver disease and can be extended to additional diseases or modalities with similar heterogeneity challenges.
Abstract
Chronic liver disease represents a significant health challenge worldwide and accurate prognostic evaluations are essential for personalized treatment plans. Recent evidence suggests that integrating multimodal data, such as computed tomography imaging, radiomic features, and clinical information, can provide more comprehensive prognostic information. However, modalities have an inherent heterogeneity, and incorporating additional modalities may exacerbate the challenges of heterogeneous data fusion. Moreover, existing multimodal fusion methods often struggle to adapt to richer medical modalities, making it difficult to capture inter-modal relationships. To overcome these limitations, We present the Triple-Modal Interaction Chronic Liver Network (TMI-CLNet). Specifically, we develop an Intra-Modality Aggregation module and a Triple-Modal Cross-Attention Fusion module, which are designed to eliminate intra-modality redundancy and extract cross-modal information, respectively. Furthermore, we design a Triple-Modal Feature Fusion loss function to align feature representations across modalities. Extensive experiments on the liver prognosis dataset demonstrate that our approach significantly outperforms existing state-of-the-art unimodal models and other multi-modal techniques. Our code is available at https://github.com/Mysterwll/liver.git.
