Table of Contents
Fetching ...

The ULS23 Challenge: a Baseline Model and Benchmark Dataset for 3D Universal Lesion Segmentation in Computed Tomography

M. J. J. de Grauw, E. Th. Scholten, E. J. Smit, M. J. C. M. Rutten, M. Prokop, B. van Ginneken, A. Hering

TL;DR

The ULS23 benchmark for 3D universal lesion segmentation in chest-abdomen-pelvis CT examinations is introduced and a baseline semi-supervised 3D lesion segmentation model is developed and publicly released.

Abstract

Size measurements of tumor manifestations on follow-up CT examinations are crucial for evaluating treatment outcomes in cancer patients. Efficient lesion segmentation can speed up these radiological workflows. While numerous benchmarks and challenges address lesion segmentation in specific organs like the liver, kidneys, and lungs, the larger variety of lesion types encountered in clinical practice demands a more universal approach. To address this gap, we introduced the ULS23 benchmark for 3D universal lesion segmentation in chest-abdomen-pelvis CT examinations. The ULS23 training dataset contains 38,693 lesions across this region, including challenging pancreatic, colon and bone lesions. For evaluation purposes, we curated a dataset comprising 775 lesions from 284 patients. Each of these lesions was identified as a target lesion in a clinical context, ensuring diversity and clinical relevance within this dataset. The ULS23 benchmark is publicly accessible via uls23.grand-challenge.org, enabling researchers worldwide to assess the performance of their segmentation methods. Furthermore, we have developed and publicly released our baseline semi-supervised 3D lesion segmentation model. This model achieved an average Dice coefficient of 0.703 $\pm$ 0.240 on the challenge test set. We invite ongoing submissions to advance the development of future ULS models.

The ULS23 Challenge: a Baseline Model and Benchmark Dataset for 3D Universal Lesion Segmentation in Computed Tomography

TL;DR

The ULS23 benchmark for 3D universal lesion segmentation in chest-abdomen-pelvis CT examinations is introduced and a baseline semi-supervised 3D lesion segmentation model is developed and publicly released.

Abstract

Size measurements of tumor manifestations on follow-up CT examinations are crucial for evaluating treatment outcomes in cancer patients. Efficient lesion segmentation can speed up these radiological workflows. While numerous benchmarks and challenges address lesion segmentation in specific organs like the liver, kidneys, and lungs, the larger variety of lesion types encountered in clinical practice demands a more universal approach. To address this gap, we introduced the ULS23 benchmark for 3D universal lesion segmentation in chest-abdomen-pelvis CT examinations. The ULS23 training dataset contains 38,693 lesions across this region, including challenging pancreatic, colon and bone lesions. For evaluation purposes, we curated a dataset comprising 775 lesions from 284 patients. Each of these lesions was identified as a target lesion in a clinical context, ensuring diversity and clinical relevance within this dataset. The ULS23 benchmark is publicly accessible via uls23.grand-challenge.org, enabling researchers worldwide to assess the performance of their segmentation methods. Furthermore, we have developed and publicly released our baseline semi-supervised 3D lesion segmentation model. This model achieved an average Dice coefficient of 0.703 0.240 on the challenge test set. We invite ongoing submissions to advance the development of future ULS models.
Paper Structure (30 sections, 2 equations, 12 figures, 5 tables)

This paper contains 30 sections, 2 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Histograms depicting the long- and short-axis measurements in millimeters for various lesion types in the fully-annotated training data reveal notable trends. Kidney and colon lesions tend to be larger on average. Lymph nodes, pancreas, and colon lesions exhibit a greater disparity between their long- and short-axis sizes, indicating that these lesions are more often non-spherical.
  • Figure 2: Examples of GrabCut pseudo-masks. From left to right, a kidney lesion, mediastinal lymph node, subcutaneous mass, and lung lesion. Note how GrabCut tends to oversegment (orange mask $\blacksquare$) into healthy tissues compared to the reference measurements (purple lines $\blacksquare$). Lung lesions are visualized using Window Level: -500 HU, Window Width: 1400 HU. Lesions outside the lungs with WL: 350 WW: 40.
  • Figure 3: Training pipeline for the semi-supervised baseline model. a) In the first training iteration a nnUnet is pretrained using the 2D GrabCut masks generated from the partially annotated data, and then fine-tuned on the fully annotated data. b) In the second training iteration a different nnUnet is pretrained using the predicted 3D pseudo-masks for the partially annotated data and then fine-tuned using the fully-annotated data.
  • Figure 4: Boxplots of the long- and short-axis measurement errors for the baseline model on the different lesion types in the held-out training data. SAPE = Symmetric Average Prediction Error.
  • Figure 5: Boxplots of the long- and short-axis measurement errors for the baseline model on the test set. The fully-supervised types are lung, liver, kidney, colon, pancreas, bone lesions and lymph nodes. Partially-supervised lesion types are those included in the partially annotated data e.g. adrenal, ovary, subcutaneous. SAPE = Symmetric Absolute Percentage Error.
  • ...and 7 more figures