Table of Contents
Fetching ...

Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases

Xiangde Luo, Zihan Li, Shaoting Zhang, Wenjun Liao, Guotai Wang

TL;DR

RAOS tackles the gap between high performance in abdominal organ segmentation and clinical applicability by emphasizing robustness to corner cases, such as organ resection. It introduces RAOS, a manually labeled dataset of 413 CT scans with 19 organs, partitioned into without-surgery, surgery-without-organ-missing, and surgery-with-organ-missing subsets to probe robustness, and benchmarks seven SOTA methods using DSC, NSD, and an organ-hallucination metric. The results reveal substantial drops in performance for clinical corner cases and organ-loss scenarios, and expose domain gaps when transferring to public datasets, underscoring the need for robustness and domain-adaptation research. By establishing RAOS as a baseline evaluation resource, the work facilitates development of clinically reliable segmentation tools for radiotherapy planning and follow-up in abdominal cancer care.

Abstract

Deep learning has enabled great strides in abdominal multi-organ segmentation, even surpassing junior oncologists on common cases or organs. However, robustness on corner cases and complex organs remains a challenging open problem for clinical adoption. To investigate model robustness, we collected and annotated the RAOS dataset comprising 413 CT scans ($\sim$80k 2D images, $\sim$8k 3D organ annotations) from 413 patients each with 17 (female) or 19 (male) labelled organs, manually delineated by oncologists. We grouped scans based on clinical information into 1) diagnosis/radiotherapy (317 volumes), 2) partial excision without the whole organ missing (22 volumes), and 3) excision with the whole organ missing (74 volumes). RAOS provides a potential benchmark for evaluating model robustness including organ hallucination. It also includes some organs that can be very hard to access on public datasets like the rectum, colon, intestine, prostate and seminal vesicles. We benchmarked several state-of-the-art methods in these three clinical groups to evaluate performance and robustness. We also assessed cross-generalization between RAOS and three public datasets. This dataset and comprehensive analysis establish a potential baseline for future robustness research: \url{https://github.com/Luoxd1996/RAOS}.

Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases

TL;DR

RAOS tackles the gap between high performance in abdominal organ segmentation and clinical applicability by emphasizing robustness to corner cases, such as organ resection. It introduces RAOS, a manually labeled dataset of 413 CT scans with 19 organs, partitioned into without-surgery, surgery-without-organ-missing, and surgery-with-organ-missing subsets to probe robustness, and benchmarks seven SOTA methods using DSC, NSD, and an organ-hallucination metric. The results reveal substantial drops in performance for clinical corner cases and organ-loss scenarios, and expose domain gaps when transferring to public datasets, underscoring the need for robustness and domain-adaptation research. By establishing RAOS as a baseline evaluation resource, the work facilitates development of clinically reliable segmentation tools for radiotherapy planning and follow-up in abdominal cancer care.

Abstract

Deep learning has enabled great strides in abdominal multi-organ segmentation, even surpassing junior oncologists on common cases or organs. However, robustness on corner cases and complex organs remains a challenging open problem for clinical adoption. To investigate model robustness, we collected and annotated the RAOS dataset comprising 413 CT scans (80k 2D images, 8k 3D organ annotations) from 413 patients each with 17 (female) or 19 (male) labelled organs, manually delineated by oncologists. We grouped scans based on clinical information into 1) diagnosis/radiotherapy (317 volumes), 2) partial excision without the whole organ missing (22 volumes), and 3) excision with the whole organ missing (74 volumes). RAOS provides a potential benchmark for evaluating model robustness including organ hallucination. It also includes some organs that can be very hard to access on public datasets like the rectum, colon, intestine, prostate and seminal vesicles. We benchmarked several state-of-the-art methods in these three clinical groups to evaluate performance and robustness. We also assessed cross-generalization between RAOS and three public datasets. This dataset and comprehensive analysis establish a potential baseline for future robustness research: \url{https://github.com/Luoxd1996/RAOS}.
Paper Structure (13 sections, 1 figure, 6 tables)