How to select slices for annotation to train best-performing deep learning segmentation models for cross-sectional medical images?
Yixin Zhang, Kevin Kramer, Maciej A. Mazurowski
TL;DR
This study addresses how to optimally allocate manual annotations for cross-sectional medical images to train high-performing segmentation models under budget constraints. By systematically varying annotation density $\rho$, volume count $s$, slice-selection strategies, and the use of mask interpolation across 2D and 3D models on LiTS17, DBC, and ATLAS, it shows that distributing the annotation budget across more volumes with fewer slices per volume is generally advantageous, especially for 2D segmentation. It also finds that unsupervised active learning (UAL) within volumes rarely outperforms random or fixed-interval slice selection, and mask interpolation largely fails to improve performance except in a few 3D, low-density scenarios. These insights provide practical guidance for cost-effective annotation in cross-sectional medical imaging, while highlighting the need for dataset- and architecture-aware refinements and broader validation.
Abstract
Automated segmentation of medical images heavily relies on the availability of precise manual annotations. However, generating these annotations is often time-consuming, expensive, and sometimes requires specialized expertise (especially for cross-sectional medical images). Therefore, it is essential to optimize the use of annotation resources to ensure efficiency and effectiveness. In this paper, we systematically address the question: "in a non-interactive annotation pipeline, how should slices from cross-sectional medical images be selected for annotation to maximize the performance of the resulting deep learning segmentation models?" We conducted experiments on 4 medical imaging segmentation tasks with varying annotation budgets, numbers of annotated cases, numbers of annotated slices per volume, slice selection techniques, and mask interpolations. We found that: 1) It is almost always preferable to annotate fewer slices per volume and more volumes given an annotation budget. 2) Selecting slices for annotation by unsupervised active learning (UAL) is not superior to selecting slices randomly or at fixed intervals, provided that each volume is allocated the same number of annotated slices. 3) Interpolating masks between annotated slices rarely enhances model performance, with exceptions of some specific configuration for 3D models.
