Table of Contents
Fetching ...

CTSpine1K: A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

Yang Deng, Ce Wang, Yuan Hui, Qian Li, Jun Li, Shiwei Luo, Mengke Sun, Quan Quan, Shuxin Yang, You Hao, Pengbo Liu, Honghu Xiao, Chunpeng Zhao, Xinbao Wu, S. Kevin Zhou

TL;DR

CTSpine1K delivers a large-scale, multi-source spine CT dataset (1,005 volumes, >11k vertebrae) with a rigorous annotation pipeline and a first benchmark for vertebra segmentation. Using nnUnet as a baseline, the paper demonstrates strong segmentation performance on the new dataset while revealing domain gaps when evaluated against VerSe datasets, and highlights challenges in diseased or morphologically variant vertebrae. The work provides public data, tools, and a reproducible benchmarking framework to advance vertebra segmentation, labeling, and 3D spine reconstruction research across diverse imaging sources.

Abstract

Spine-related diseases have high morbidity and cause a huge burden of social cost. Spine imaging is an essential tool for noninvasively visualizing and assessing spinal pathology. Segmenting vertebrae in computed tomography (CT) images is the basis of quantitative medical image analysis for clinical diagnosis and surgery planning of spine diseases. Current publicly available annotated datasets on spinal vertebrae are small in size. Due to the lack of a large-scale annotated spine image dataset, the mainstream deep learning-based segmentation methods, which are data-driven, are heavily restricted. In this paper, we introduce a large-scale spine CT dataset, called CTSpine1K, curated from multiple sources for vertebra segmentation, which contains 1,005 CT volumes with over 11,100 labeled vertebrae belonging to different spinal conditions. Based on this dataset, we conduct several spinal vertebrae segmentation experiments to set the first benchmark. We believe that this large-scale dataset will facilitate further research in many spine-related image analysis tasks, including but not limited to vertebrae segmentation, labeling, 3D spine reconstruction from biplanar radiographs, image super-resolution, and enhancement.

CTSpine1K: A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

TL;DR

CTSpine1K delivers a large-scale, multi-source spine CT dataset (1,005 volumes, >11k vertebrae) with a rigorous annotation pipeline and a first benchmark for vertebra segmentation. Using nnUnet as a baseline, the paper demonstrates strong segmentation performance on the new dataset while revealing domain gaps when evaluated against VerSe datasets, and highlights challenges in diseased or morphologically variant vertebrae. The work provides public data, tools, and a reproducible benchmarking framework to advance vertebra segmentation, labeling, and 3D spine reconstruction research across diverse imaging sources.

Abstract

Spine-related diseases have high morbidity and cause a huge burden of social cost. Spine imaging is an essential tool for noninvasively visualizing and assessing spinal pathology. Segmenting vertebrae in computed tomography (CT) images is the basis of quantitative medical image analysis for clinical diagnosis and surgery planning of spine diseases. Current publicly available annotated datasets on spinal vertebrae are small in size. Due to the lack of a large-scale annotated spine image dataset, the mainstream deep learning-based segmentation methods, which are data-driven, are heavily restricted. In this paper, we introduce a large-scale spine CT dataset, called CTSpine1K, curated from multiple sources for vertebra segmentation, which contains 1,005 CT volumes with over 11,100 labeled vertebrae belonging to different spinal conditions. Based on this dataset, we conduct several spinal vertebrae segmentation experiments to set the first benchmark. We believe that this large-scale dataset will facilitate further research in many spine-related image analysis tasks, including but not limited to vertebrae segmentation, labeling, 3D spine reconstruction from biplanar radiographs, image super-resolution, and enhancement.

Paper Structure

This paper contains 12 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Spine CT image examples with various conditions.
  • Figure 2: The proposed annotation pipeline.
  • Figure 3: The difference between the COLONGO dataset and VerSe'dataset.
  • Figure 4: The visualization results on different sub-dataset. The first row and the fourth row, respectively, represent the sub-dataset COLONOG, MSD T10, HNSCC-3DCT-RT, and COVID-19. The last row indicates some failed predictions resulting from spinal diseases: sacral lumbarization and lumbar sacralization.