AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

Yuanfeng Ji; Haotian Bai; Jie Yang; Chongjian Ge; Ye Zhu; Ruimao Zhang; Zhen Li; Lingyan Zhang; Wanling Ma; Xiang Wan; Ping Luo

AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

Yuanfeng Ji, Haotian Bai, Jie Yang, Chongjian Ge, Ye Zhu, Ruimao Zhang, Zhen Li, Lingyan Zhang, Wanling Ma, Xiang Wan, Ping Luo

TL;DR

AMOS tackles the bottleneck of small, homogeneous abdominal segmentation datasets by introducing a large-scale, multi-center CT/MRI benchmark with 15 organ annotations. It couples a careful semi-automatic annotation workflow with diverse data splits (ID/OOD) to evaluate robustness, generalization, and cross-modality learning. The authors benchmark multiple baselines, revealing gaps in current methods and demonstrating AMOS’s utility for transfer learning and OOD research. Overall, AMOS provides a practical, real-world testing ground to push multi-organ segmentation toward clinically robust performance across diverse imaging scenarios.

Abstract

Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. Information can be found at https://amos22.grand-challenge.org.

AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

TL;DR

Abstract

Paper Structure (54 sections, 4 figures, 12 tables)

This paper contains 54 sections, 4 figures, 12 tables.

Introduction
Related Work
Abdominal organ segmentation datasets
Methods for single abdominal organ segmentation
Methods for Abdominal multi-organ segmentation
AMOS
Dataset Construction
Data Overview
Data Collection
Data Annotation
Data Splits
Data Distribution
Dataset Statistics
Cohort Statistics
Annotation Statistics
...and 39 more sections

Figures (4)

Figure 1: Example annotated slices from AMOS dataset. Watch the animations by clicking them (Not all PDF readers support playing animations. Best viewed with Acrobat/Foxit Reader). The top and bottom two rows show the CT and MRI slices acquired from different scanners, respectively.
Figure 2: Annotation workflow of AMOS. The coarse annotations automatically labeled by pre-trained segmentors will be further refined by human annotators for multiple times, including 5 junior radiologists for the initial stage and 3 senior specialists for the second checking stage.
Figure 3: Statistics on data targets as well as data annotation, reflecting that AMOS is a clinical, highly diverse data set. The x-axis units of both figures are counts
Figure 4: Organ volume distribution of BTCV, Chaos, AbdomentCT-1K and AMOS datasets.

AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

TL;DR

Abstract

AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)