Table of Contents
Fetching ...

Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge

Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Ershuai Wang, Qin Zhou, Ziyan Huang, Pengju Lyu, Jian He, Bo Wang

TL;DR

The winning team established a new state-of-the-art with a deep learning-based cascaded framework, achieving average Dice Similarity Coefficient scores of 92.3% for organs and 64.9% for lesions on the hidden multi-national testing set.

Abstract

Organ and cancer segmentation in abdomen Computed Tomography (CT) scans is the prerequisite for precise cancer diagnosis and treatment. Most existing benchmarks and algorithms are tailored to specific cancer types, limiting their ability to provide comprehensive cancer analysis. This work presents the first international competition on abdominal organ and pan-cancer segmentation by providing a large-scale and diverse dataset, including 4650 CT scans with various cancer types from over 40 medical centers. The winning team established a new state-of-the-art with a deep learning-based cascaded framework, achieving average Dice Similarity Coefficient scores of 92.3% for organs and 64.9% for lesions on the hidden multi-national testing set. The dataset and code of top teams are publicly available, offering a benchmark platform to drive further innovations https://codalab.lisn.upsaclay.fr/competitions/12239.

Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge

TL;DR

The winning team established a new state-of-the-art with a deep learning-based cascaded framework, achieving average Dice Similarity Coefficient scores of 92.3% for organs and 64.9% for lesions on the hidden multi-national testing set.

Abstract

Organ and cancer segmentation in abdomen Computed Tomography (CT) scans is the prerequisite for precise cancer diagnosis and treatment. Most existing benchmarks and algorithms are tailored to specific cancer types, limiting their ability to provide comprehensive cancer analysis. This work presents the first international competition on abdominal organ and pan-cancer segmentation by providing a large-scale and diverse dataset, including 4650 CT scans with various cancer types from over 40 medical centers. The winning team established a new state-of-the-art with a deep learning-based cascaded framework, achieving average Dice Similarity Coefficient scores of 92.3% for organs and 64.9% for lesions on the hidden multi-national testing set. The dataset and code of top teams are publicly available, offering a benchmark platform to drive further innovations https://codalab.lisn.upsaclay.fr/competitions/12239.
Paper Structure (17 sections, 4 figures)

This paper contains 17 sections, 4 figures.

Figures (4)

  • Figure 1: Overview of the challenge design.a, The challenge task is to automatically segment 13 abdominal organs and all kinds of lesions in abdomen CT. b, The complete challenge pipeline. c, Data source distributions of the challenge dataset. d, Comparison to existing abdominal organ and lesion segmentation benchmarks in terms of the number of CT scans. Distribution of key designs among 37 algorithms: e, network architecture, f, loss function, and g, optimizer.
  • Figure 2: Segmentation accuracy and efficiency performance analysis on the testing set (N=400).a, Segmentation accuracy (y axis) and efficiency (x axis) results of all the 37 algorithms. Each circle denotes one team and the circle size is proportional to the GPU memory consumption. The top algorithms are on the top left with a better tradeoff. b, Performance comparison of top five teams across all the metrics. Each color indicates one team and the value denotes the number of algorithms it surpassed on the corresponding metric. c, Ranking stability analysis results for all metrics by bootstrap approach (number of samples $b=1000$). The violin plot visualizes the distribution of Kendall’s $\tau$ values with a central box plot embedded to show the interquartile range, median, and outliers. The overall consistency of high Kendall's $\tau$ values across the metrics underscores a stable performance evaluation of the algorithms across different dimensions.
  • Figure 3: Organ-wise segmentation results of the top five teams on the testing set (N=400). The box plots display the average DSC (deep color) and NSD (light color) scores for each organ across all testing cases, with the median value represented by the black horizontal line within the box, the lower and upper quartiles delineating the borders of the box, and the vertical black lines indicating the 1.5 interquartile range.
  • Figure 4: Lesion semantic and instance segmentation results of the top five algorithms on the testing set (N=400).a, Dot-box plots for lesion semantic segmentation metrics (DSC and NSD scores) and instance segmentation metrics (Precision, Recall, and F1 scores). b, Lesion panoptic quality of the top five algorithms and the ensemble of top three and top five algorithms. c, Relationship between segmentation accuracy and lesion volume of the winning algorithm. d, Relationship between the predicted and true lesion volume of the winning algorithm. e, Visualized organ and tumor segmentation examples of the winning algorithm. The top row shows the reference standards and the bottom row shows the segmentation results.