Table of Contents
Fetching ...

Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

Xin Yu, Qi Yang, Han Liu, Ho Hin Lee, Yucheng Tang, Lucas W. Remedios, Michael E. Kim, Rendong Zhang, Shunxing Bao, Yuankai Huo, Ann Zenobia Moore, Luigi Ferrucci, Bennett A. Landman

TL;DR

This work tackles the challenge of segmenting a single CT slice by 2D networks, which miss 3D context and lag behind 3D models. It introduces a novel unpaired 3D-to-2D distillation framework that transfers contextual knowledge from a pretrained 3D teacher to a 2D student through class-wise prototypes and an extended DIST objective that preserves inter- and intra-class relationships. Key contributions include (1) the first medical-imaging 3D-to-2D distillation approach, (2) learning the 3D prototype during pretraining so no 3D input is needed at inference, and (3) demonstrated improvements across multiple 2D architectures and in low-data regimes, reducing annotation burdens. The approach enables leveraging abundant 3D data to boost 2D single-slice multi-organ segmentation, with potential impact on efficiency and scalability in clinical workflows.

Abstract

2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmentation results. In this work, we propose a novel 3D-to-2D distillation framework, leveraging pre-trained 3D models to enhance 2D single-slice segmentation. Specifically, we extract the prediction distribution centroid from the 3D representations, to guide the 2D student by learning intra- and inter-class correlation. Unlike traditional knowledge distillation methods that require the same data input, our approach employs unpaired 3D CT scans with any contrast to guide the 2D student model. Experiments conducted on 707 subjects from the single-slice Baltimore Longitudinal Study of Aging (BLSA) dataset demonstrate that state-of-the-art 2D multi-organ segmentation methods can benefit from the 3D teacher model, achieving enhanced performance in single-slice multi-organ segmentation. Notably, our approach demonstrates considerable efficacy in low-data regimes, outperforming the model trained with all available training subjects even when utilizing only 200 training subjects. Thus, this work underscores the potential to alleviate manual annotation burdens.

Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

TL;DR

This work tackles the challenge of segmenting a single CT slice by 2D networks, which miss 3D context and lag behind 3D models. It introduces a novel unpaired 3D-to-2D distillation framework that transfers contextual knowledge from a pretrained 3D teacher to a 2D student through class-wise prototypes and an extended DIST objective that preserves inter- and intra-class relationships. Key contributions include (1) the first medical-imaging 3D-to-2D distillation approach, (2) learning the 3D prototype during pretraining so no 3D input is needed at inference, and (3) demonstrated improvements across multiple 2D architectures and in low-data regimes, reducing annotation burdens. The approach enables leveraging abundant 3D data to boost 2D single-slice multi-organ segmentation, with potential impact on efficiency and scalability in clinical workflows.

Abstract

2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmentation results. In this work, we propose a novel 3D-to-2D distillation framework, leveraging pre-trained 3D models to enhance 2D single-slice segmentation. Specifically, we extract the prediction distribution centroid from the 3D representations, to guide the 2D student by learning intra- and inter-class correlation. Unlike traditional knowledge distillation methods that require the same data input, our approach employs unpaired 3D CT scans with any contrast to guide the 2D student model. Experiments conducted on 707 subjects from the single-slice Baltimore Longitudinal Study of Aging (BLSA) dataset demonstrate that state-of-the-art 2D multi-organ segmentation methods can benefit from the 3D teacher model, achieving enhanced performance in single-slice multi-organ segmentation. Notably, our approach demonstrates considerable efficacy in low-data regimes, outperforming the model trained with all available training subjects even when utilizing only 200 training subjects. Thus, this work underscores the potential to alleviate manual annotation burdens.
Paper Structure (9 sections, 5 equations, 4 figures, 1 table)

This paper contains 9 sections, 5 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: (a) 3D and 2D model performance comparison. All the models are trained and tested on the MICCAI 2015 Multi-Atlas Abdomen Labeling Challenge (BTCV) dataset. UNesT yu2023unest, a 3D model, has superior performance (DSC) compared to other 2D models. (b) Example abdominal slices from the single-slice CT dataset with manual annotation on right/left kidneys, liver, spleen, stomach and aorta. The scans are in a wide range of vertebral levels.
  • Figure 2: The overview of the proposed method. The whole pipeline can be divided into a frozen 3D model (teacher) and a trainable 2D model (student). 3D model is pretrained to compute dataset class-wise feature centroid (prototype). 2D prototype is computed on-the-fly during distillation, and encouraged to be close to the 3D prototype via optimizing inter- and intra-class correlation. Each class prototype has the maximum response in the channel corresponding to its class index.
  • Figure 3: Comparison of segmentation of models trained with and without distillation. In (a), models with distillation consistently reduce the variation with improved median and quartiles. * indicates statistically significant ($p < 0.05$) by Wilcoxon signed-rank test. (b) shows the results of DeepLabv3 w/ and w/o distillation using different number of training subjects.
  • Figure 4: Qualitative results of models with and without distillation. (a), (b), and (c) represent the results of DeepLabv3, Swin-Unet and MISSFormer, respectively. White arrows emphasized the segmentation improvement on right kidney (yellow), left kidney (green), spleen (red), liver (blue), and stomach (purple).