Table of Contents
Fetching ...

MDNet: Multi-Decoder Network for Abdominal CT Organs Segmentation

Debesh Jha, Nikhil Kumar Tomar, Koushik Biswas, Gorkem Durak, Matthew Antalek, Zheyuan Zhang, Bin Wang, Md Mostafijur Rahman, Hongyi Pan, Alpay Medetalibeyoglu, Yury Velichko, Daniela Ladner, Amir Borhani, Ulas Bagci

TL;DR

This work tackles automated segmentation of abdominal liver and spleen from CT amid substantial anatomical variability. It introduces MDNet, a multi-decoder encoder–decoder architecture that uses a pre-trained Mix Transformer encoder (MiT-B2), MSFED blocks for multi-scale feature fusion, and Mask Attention to provide spatial guidance across decoders, producing progressively refined masks M1, M2, and M3. MDNet achieves state-of-the-art performance on LiTS and MSD Spleen datasets, with high Dice coefficients and low Hausdorff distances, while offering interpretability through intermediate decoder outputs and visual heatmaps. The results demonstrate strong clinical relevance and robustness, with plans to extend to MRI data and volumetric analysis.

Abstract

Accurate segmentation of organs from abdominal CT scans is essential for clinical applications such as diagnosis, treatment planning, and patient monitoring. To handle challenges of heterogeneity in organ shapes, sizes, and complex anatomical relationships, we propose a \textbf{\textit{\ac{MDNet}}}, an encoder-decoder network that uses the pre-trained \textit{MiT-B2} as the encoder and multiple different decoder networks. Each decoder network is connected to a different part of the encoder via a multi-scale feature enhancement dilated block. With each decoder, we increase the depth of the network iteratively and refine segmentation masks, enriching feature maps by integrating previous decoders' feature maps. To refine the feature map further, we also utilize the predicted masks from the previous decoder to the current decoder to provide spatial attention across foreground and background regions. MDNet effectively refines the segmentation mask with a high dice similarity coefficient (DSC) of 0.9013 and 0.9169 on the Liver Tumor segmentation (LiTS) and MSD Spleen datasets. Additionally, it reduces Hausdorff distance (HD) to 3.79 for the LiTS dataset and 2.26 for the spleen segmentation dataset, underscoring the precision of MDNet in capturing the complex contours. Moreover, \textit{\ac{MDNet}} is more interpretable and robust compared to the other baseline models.

MDNet: Multi-Decoder Network for Abdominal CT Organs Segmentation

TL;DR

This work tackles automated segmentation of abdominal liver and spleen from CT amid substantial anatomical variability. It introduces MDNet, a multi-decoder encoder–decoder architecture that uses a pre-trained Mix Transformer encoder (MiT-B2), MSFED blocks for multi-scale feature fusion, and Mask Attention to provide spatial guidance across decoders, producing progressively refined masks M1, M2, and M3. MDNet achieves state-of-the-art performance on LiTS and MSD Spleen datasets, with high Dice coefficients and low Hausdorff distances, while offering interpretability through intermediate decoder outputs and visual heatmaps. The results demonstrate strong clinical relevance and robustness, with plans to extend to MRI data and volumetric analysis.

Abstract

Accurate segmentation of organs from abdominal CT scans is essential for clinical applications such as diagnosis, treatment planning, and patient monitoring. To handle challenges of heterogeneity in organ shapes, sizes, and complex anatomical relationships, we propose a \textbf{\textit{\ac{MDNet}}}, an encoder-decoder network that uses the pre-trained \textit{MiT-B2} as the encoder and multiple different decoder networks. Each decoder network is connected to a different part of the encoder via a multi-scale feature enhancement dilated block. With each decoder, we increase the depth of the network iteratively and refine segmentation masks, enriching feature maps by integrating previous decoders' feature maps. To refine the feature map further, we also utilize the predicted masks from the previous decoder to the current decoder to provide spatial attention across foreground and background regions. MDNet effectively refines the segmentation mask with a high dice similarity coefficient (DSC) of 0.9013 and 0.9169 on the Liver Tumor segmentation (LiTS) and MSD Spleen datasets. Additionally, it reduces Hausdorff distance (HD) to 3.79 for the LiTS dataset and 2.26 for the spleen segmentation dataset, underscoring the precision of MDNet in capturing the complex contours. Moreover, \textit{\ac{MDNet}} is more interpretable and robust compared to the other baseline models.
Paper Structure (13 sections, 1 equation, 2 figures, 3 tables)

This paper contains 13 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: The block diagram shows an overview of the proposed MDNet. A MiT-B2 encoder processes the input image to extract feature maps at four different levels (F1, F2, F3, and F4). Each decoder network is connected to a different part of the decoder via a multi-scale feature enhancement dilated block to increase the depth of the network to predict three distinct segmentation masks. Additionally, the decoders are connected in a way that the output feature from the preceding decoders is utilized in the subsequent one to refine the segmentation output further. Moreover, we also use the predicted masks of the prior decoder in the subsequent decoder for the further refinement of the feature map. This process provides spatial attention across foreground and background regions and enhances the final segmentation results.
  • Figure 2: Qualitative comparison of models on LiTS bilic2023liver and MSD Spleen data antonelli2022medical.