Benchmarking Multi-Organ Segmentation Tools for Multi-Parametric T1-weighted Abdominal MRI
Nicole Tran, Anisa Prasad, Yan Zhuang, Tejas Sudharshan Mathai, Boah Kim, Sydney Lewis, Pritam Mukherjee, Jianfei Liu, Ronald M. Summers
TL;DR
This work tackles the problem of robust multi-organ segmentation in multi-parametric abdominal MRI by benchmarking three public tools (MRSegmentator, TotalSegmentator MRI, TotalVibeSegmentator) on a curated 40-volume T1-weighted dataset derived from the Duke Liver Dataset. The authors quantify performance across four sequence types (pre-contrast, arterial, venous, and delayed) using 10 labeled abdominal structures and two metrics, Dice similarity and Hausdorff Distance, with statistical tests to assess cross-tool differences. MRSegmentator achieves the best overall performance, with a Dice of $80.7 \pm 18.6$ and an HD of $8.9 \pm 10.4$ mm ($p<0.001$ vs. the others), and shows consistent superiority across large, medium, and small organs. The results highlight the influence of training data (MRI vs CT mixtures) on generalization to abdominal MRI and inform tool selection and dataset design for reliable abdominal organ segmentation in clinical workflows.
Abstract
The segmentation of multiple organs in multi-parametric MRI studies is critical for many applications in radiology, such as correlating imaging biomarkers with disease status (e.g., cirrhosis, diabetes). Recently, three publicly available tools, such as MRSegmentator (MRSeg), TotalSegmentator MRI (TS), and TotalVibeSegmentator (VIBE), have been proposed for multi-organ segmentation in MRI. However, the performance of these tools on specific MRI sequence types has not yet been quantified. In this work, a subset of 40 volumes from the public Duke Liver Dataset was curated. The curated dataset contained 10 volumes each from the pre-contrast fat saturated T1, arterial T1w, venous T1w, and delayed T1w phases, respectively. Ten abdominal structures were manually annotated in these volumes. Next, the performance of the three public tools was benchmarked on this curated dataset. The results indicated that MRSeg obtained a Dice score of 80.7 $\pm$ 18.6 and Hausdorff Distance (HD) error of 8.9 $\pm$ 10.4 mm. It fared the best ($p < .05$) across the different sequence types in contrast to TS and VIBE.
