Table of Contents
Fetching ...

Volumetric medical image segmentation through dual self-distillation in U-shaped networks

Soumyanil Banerjee, Nicholas Summerfield, Ming Dong, Carri Glide-Hurst

TL;DR

This work tackles improving 3D medical image segmentation by introducing Dual Self-Distillation (DSD) for U-shaped networks. The method combines deep supervision from ground-truth labels to decoder outputs with encoder–decoder self-distillation, where the deepest encoder/decoder teachers guide shallower layers via KL-divergence, implemented through lightweight bottleneck modules. Across MMWHS, BraTS, and Hippocampus datasets, DSD consistently boosts Dice scores and reduces boundary errors with negligible increases in parameters and training time, outperforming comparable self-distillation approaches like MISSU. Overall, DSD offers a versatile, efficient training strategy that enhances segmentation quality for diverse U-shaped backbones without requiring large pretrained teachers.

Abstract

U-shaped networks and its variants have demonstrated exceptional results for medical image segmentation. In this paper, we propose a novel dual self-distillation (DSD) framework in U-shaped networks for volumetric medical image segmentation. DSD distills knowledge from the ground-truth segmentation labels to the decoder layers. Additionally, DSD also distills knowledge from the deepest decoder and encoder layer to the shallower decoder and encoder layers respectively of a single U-shaped network. DSD is a general training strategy that could be attached to the backbone architecture of any U-shaped network to further improve its segmentation performance. We attached DSD on several state-of-the-art U-shaped backbones, and extensive experiments on various public 3D medical image segmentation datasets (cardiac substructure, brain tumor and Hippocampus) demonstrated significant improvement over the same backbones without DSD. On average, after attaching DSD to the U-shaped backbones, we observed an increase of 2.82\%, 4.53\% and 1.3\% in Dice similarity score, a decrease of 7.15 mm, 6.48 mm and 0.76 mm in the Hausdorff distance, for cardiac substructure, brain tumor and Hippocampus segmentation, respectively. These improvements were achieved with negligible increase in the number of trainable parameters and training time. Our proposed DSD framework also led to significant qualitative improvements for cardiac substructure, brain tumor and Hippocampus segmentation over the U-shaped backbones. The source code is publicly available at https://github.com/soumbane/DualSelfDistillation.

Volumetric medical image segmentation through dual self-distillation in U-shaped networks

TL;DR

This work tackles improving 3D medical image segmentation by introducing Dual Self-Distillation (DSD) for U-shaped networks. The method combines deep supervision from ground-truth labels to decoder outputs with encoder–decoder self-distillation, where the deepest encoder/decoder teachers guide shallower layers via KL-divergence, implemented through lightweight bottleneck modules. Across MMWHS, BraTS, and Hippocampus datasets, DSD consistently boosts Dice scores and reduces boundary errors with negligible increases in parameters and training time, outperforming comparable self-distillation approaches like MISSU. Overall, DSD offers a versatile, efficient training strategy that enhances segmentation quality for diverse U-shaped backbones without requiring large pretrained teachers.

Abstract

U-shaped networks and its variants have demonstrated exceptional results for medical image segmentation. In this paper, we propose a novel dual self-distillation (DSD) framework in U-shaped networks for volumetric medical image segmentation. DSD distills knowledge from the ground-truth segmentation labels to the decoder layers. Additionally, DSD also distills knowledge from the deepest decoder and encoder layer to the shallower decoder and encoder layers respectively of a single U-shaped network. DSD is a general training strategy that could be attached to the backbone architecture of any U-shaped network to further improve its segmentation performance. We attached DSD on several state-of-the-art U-shaped backbones, and extensive experiments on various public 3D medical image segmentation datasets (cardiac substructure, brain tumor and Hippocampus) demonstrated significant improvement over the same backbones without DSD. On average, after attaching DSD to the U-shaped backbones, we observed an increase of 2.82\%, 4.53\% and 1.3\% in Dice similarity score, a decrease of 7.15 mm, 6.48 mm and 0.76 mm in the Hausdorff distance, for cardiac substructure, brain tumor and Hippocampus segmentation, respectively. These improvements were achieved with negligible increase in the number of trainable parameters and training time. Our proposed DSD framework also led to significant qualitative improvements for cardiac substructure, brain tumor and Hippocampus segmentation over the U-shaped backbones. The source code is publicly available at https://github.com/soumbane/DualSelfDistillation.
Paper Structure (17 sections, 6 equations, 5 figures, 7 tables)

This paper contains 17 sections, 6 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Self-distillation demonstrated with an U-shaped network for volumetric medical image segmentation. $Y$ and $G$ indicate the softmax of network output and one-hot encoded GT Labels, respectively. $E_i \vert_{i=1}^{Z}$ and $D_i \vert_{i=1}^{Z}$ denote the output of the bottleneck module on the encoder and decoder side, respectively. $T$ and $S$ denote the teacher and student probability distributions, respectively. All dashed lines shown are only used during training and removed during inference. The input is shown as a stack of images to indicate a 3D CT volume, and the output shows one-slice of the GT Labels overlayed on a 2D CT slice for a clear visualization of the different segmentation classes.
  • Figure 2: Qualitative comparison of an axial slice and 3D volumes, with GT Labels (on CTA) and predictions with (A) UNETR and (B) nnU-Net, highlighting the improved segmentations (shown by Dice score (%)) with our proposed DSD framework. The Dice score (%) is for the full 3D volume of a patient belonging to the validation set.
  • Figure 3: Qualitative comparison of an axial slice with GT Labels (on FLAIR MRI) and predictions from (A) UNETR, (B) nnU-Net and (C) Swin UNETR, highlighting the improvements in segmentation (shown by Dice score (%)) with our proposed DSD framework. The Dice score (%) is for the full 3D volume of a patient belonging to the testing set.
  • Figure 4: The mean and standard deviation of Dice score (%) for all patients in the testing set for each foreground class of MSD-BraTS.
  • Figure 5: Qualitative comparison of an axial slice with GT Labels (on T1w MRI) and predictions from (A) VNET, (B) UNETR and (C) nnU-Net, highlighting the improvements in segmentation (shown by Dice score (%)) with our proposed DSD framework. The Dice score (%) is for the full 3D volume of a patient belonging to the testing set.