Table of Contents
Fetching ...

How to select slices for annotation to train best-performing deep learning segmentation models for cross-sectional medical images?

Yixin Zhang, Kevin Kramer, Maciej A. Mazurowski

TL;DR

This study addresses how to optimally allocate manual annotations for cross-sectional medical images to train high-performing segmentation models under budget constraints. By systematically varying annotation density $\rho$, volume count $s$, slice-selection strategies, and the use of mask interpolation across 2D and 3D models on LiTS17, DBC, and ATLAS, it shows that distributing the annotation budget across more volumes with fewer slices per volume is generally advantageous, especially for 2D segmentation. It also finds that unsupervised active learning (UAL) within volumes rarely outperforms random or fixed-interval slice selection, and mask interpolation largely fails to improve performance except in a few 3D, low-density scenarios. These insights provide practical guidance for cost-effective annotation in cross-sectional medical imaging, while highlighting the need for dataset- and architecture-aware refinements and broader validation.

Abstract

Automated segmentation of medical images heavily relies on the availability of precise manual annotations. However, generating these annotations is often time-consuming, expensive, and sometimes requires specialized expertise (especially for cross-sectional medical images). Therefore, it is essential to optimize the use of annotation resources to ensure efficiency and effectiveness. In this paper, we systematically address the question: "in a non-interactive annotation pipeline, how should slices from cross-sectional medical images be selected for annotation to maximize the performance of the resulting deep learning segmentation models?" We conducted experiments on 4 medical imaging segmentation tasks with varying annotation budgets, numbers of annotated cases, numbers of annotated slices per volume, slice selection techniques, and mask interpolations. We found that: 1) It is almost always preferable to annotate fewer slices per volume and more volumes given an annotation budget. 2) Selecting slices for annotation by unsupervised active learning (UAL) is not superior to selecting slices randomly or at fixed intervals, provided that each volume is allocated the same number of annotated slices. 3) Interpolating masks between annotated slices rarely enhances model performance, with exceptions of some specific configuration for 3D models.

How to select slices for annotation to train best-performing deep learning segmentation models for cross-sectional medical images?

TL;DR

This study addresses how to optimally allocate manual annotations for cross-sectional medical images to train high-performing segmentation models under budget constraints. By systematically varying annotation density , volume count , slice-selection strategies, and the use of mask interpolation across 2D and 3D models on LiTS17, DBC, and ATLAS, it shows that distributing the annotation budget across more volumes with fewer slices per volume is generally advantageous, especially for 2D segmentation. It also finds that unsupervised active learning (UAL) within volumes rarely outperforms random or fixed-interval slice selection, and mask interpolation largely fails to improve performance except in a few 3D, low-density scenarios. These insights provide practical guidance for cost-effective annotation in cross-sectional medical imaging, while highlighting the need for dataset- and architecture-aware refinements and broader validation.

Abstract

Automated segmentation of medical images heavily relies on the availability of precise manual annotations. However, generating these annotations is often time-consuming, expensive, and sometimes requires specialized expertise (especially for cross-sectional medical images). Therefore, it is essential to optimize the use of annotation resources to ensure efficiency and effectiveness. In this paper, we systematically address the question: "in a non-interactive annotation pipeline, how should slices from cross-sectional medical images be selected for annotation to maximize the performance of the resulting deep learning segmentation models?" We conducted experiments on 4 medical imaging segmentation tasks with varying annotation budgets, numbers of annotated cases, numbers of annotated slices per volume, slice selection techniques, and mask interpolations. We found that: 1) It is almost always preferable to annotate fewer slices per volume and more volumes given an annotation budget. 2) Selecting slices for annotation by unsupervised active learning (UAL) is not superior to selecting slices randomly or at fixed intervals, provided that each volume is allocated the same number of annotated slices. 3) Interpolating masks between annotated slices rarely enhances model performance, with exceptions of some specific configuration for 3D models.

Paper Structure

This paper contains 23 sections, 16 figures, 3 tables, 1 algorithm.

Figures (16)

  • Figure 1: Average model performance drops when sparsely annotated datasets with different annotation density and volume counts are used for training 2D (left) and 3D (right) models. Volume counts are adjusted to match the annotation budget.
  • Figure 4: Performance of 2D and 3D models when trained on the same set of sparsely annotated volumes before and after the use of mask interpolation. The horizontal and vertical line segments plotted around the data points indicates the standard deviation for the "with M.I." and "no M.I." variants respectively.
  • Figure 5: Distribution of relative performance for different slices selection methods
  • Figure 6: LiTS17: The white masks indicate the liver, while the brown masks indicate liver tumors.
  • Figure 7: DBC: The top two rows display an MRI scan with corresponding breast masks, while the bottom two rows illustrate an example with fibroglandular tissue masks.
  • ...and 11 more figures