Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation
Xin Yu, Qi Yang, Han Liu, Ho Hin Lee, Yucheng Tang, Lucas W. Remedios, Michael E. Kim, Rendong Zhang, Shunxing Bao, Yuankai Huo, Ann Zenobia Moore, Luigi Ferrucci, Bennett A. Landman
TL;DR
This work tackles the challenge of segmenting a single CT slice by 2D networks, which miss 3D context and lag behind 3D models. It introduces a novel unpaired 3D-to-2D distillation framework that transfers contextual knowledge from a pretrained 3D teacher to a 2D student through class-wise prototypes and an extended DIST objective that preserves inter- and intra-class relationships. Key contributions include (1) the first medical-imaging 3D-to-2D distillation approach, (2) learning the 3D prototype during pretraining so no 3D input is needed at inference, and (3) demonstrated improvements across multiple 2D architectures and in low-data regimes, reducing annotation burdens. The approach enables leveraging abundant 3D data to boost 2D single-slice multi-organ segmentation, with potential impact on efficiency and scalability in clinical workflows.
Abstract
2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmentation results. In this work, we propose a novel 3D-to-2D distillation framework, leveraging pre-trained 3D models to enhance 2D single-slice segmentation. Specifically, we extract the prediction distribution centroid from the 3D representations, to guide the 2D student by learning intra- and inter-class correlation. Unlike traditional knowledge distillation methods that require the same data input, our approach employs unpaired 3D CT scans with any contrast to guide the 2D student model. Experiments conducted on 707 subjects from the single-slice Baltimore Longitudinal Study of Aging (BLSA) dataset demonstrate that state-of-the-art 2D multi-organ segmentation methods can benefit from the 3D teacher model, achieving enhanced performance in single-slice multi-organ segmentation. Notably, our approach demonstrates considerable efficacy in low-data regimes, outperforming the model trained with all available training subjects even when utilizing only 200 training subjects. Thus, this work underscores the potential to alleviate manual annotation burdens.
