Table of Contents
Fetching ...

SpineCLUE: Automatic Vertebrae Identification Using Contrastive Learning and Uncertainty Estimation

Sheng Zhang, Minheng Chen, Junxian Wu, Ziyue Zhang, Tonglong Li, Cheng Xue, Youyong Kong

TL;DR

SpineCLUE tackles vertebrae identification from CT scans with arbitrary fields of view by decomposing the problem into localization, segmentation, and identification at the vertebra level. It introduces dual-factor density clustering to robustly locate vertebrae centers, supervised contrastive learning to address inter-class similarity and intra-class variability, and an uncertainty-guided fusion mechanism to refine sequence predictions. The method achieves state-of-the-art ID-rate on VerSe19 and VerSe20 benchmarks and demonstrates strong generalization on an abnormal spine dataset with scoliosis and metal implants. The proposed framework offers a robust, scalable solution for clinical spine analysis under diverse imaging conditions.

Abstract

Vertebrae identification in arbitrary fields-of-view plays a crucial role in diagnosing spine disease. Most spine CT contain only local regions, such as the neck, chest, and abdomen. Therefore, identification should not depend on specific vertebrae or a particular number of vertebrae being visible. Existing methods at the spine-level are unable to meet this challenge. In this paper, we propose a three-stage method to address the challenges in 3D CT vertebrae identification at vertebrae-level. By sequentially performing the tasks of vertebrae localization, segmentation, and identification, the anatomical prior information of the vertebrae is effectively utilized throughout the process. Specifically, we introduce a dual-factor density clustering algorithm to acquire localization information for individual vertebra, thereby facilitating subsequent segmentation and identification processes. In addition, to tackle the issue of interclass similarity and intra-class variability, we pre-train our identification network by using a supervised contrastive learning method. To further optimize the identification results, we estimated the uncertainty of the classification network and utilized the message fusion module to combine the uncertainty scores, while aggregating global information about the spine. Our method achieves state-of-the-art results on the VerSe19 and VerSe20 challenge benchmarks. Additionally, our approach demonstrates outstanding generalization performance on an collected dataset containing a wide range of abnormal cases.

SpineCLUE: Automatic Vertebrae Identification Using Contrastive Learning and Uncertainty Estimation

TL;DR

SpineCLUE tackles vertebrae identification from CT scans with arbitrary fields of view by decomposing the problem into localization, segmentation, and identification at the vertebra level. It introduces dual-factor density clustering to robustly locate vertebrae centers, supervised contrastive learning to address inter-class similarity and intra-class variability, and an uncertainty-guided fusion mechanism to refine sequence predictions. The method achieves state-of-the-art ID-rate on VerSe19 and VerSe20 benchmarks and demonstrates strong generalization on an abnormal spine dataset with scoliosis and metal implants. The proposed framework offers a robust, scalable solution for clinical spine analysis under diverse imaging conditions.

Abstract

Vertebrae identification in arbitrary fields-of-view plays a crucial role in diagnosing spine disease. Most spine CT contain only local regions, such as the neck, chest, and abdomen. Therefore, identification should not depend on specific vertebrae or a particular number of vertebrae being visible. Existing methods at the spine-level are unable to meet this challenge. In this paper, we propose a three-stage method to address the challenges in 3D CT vertebrae identification at vertebrae-level. By sequentially performing the tasks of vertebrae localization, segmentation, and identification, the anatomical prior information of the vertebrae is effectively utilized throughout the process. Specifically, we introduce a dual-factor density clustering algorithm to acquire localization information for individual vertebra, thereby facilitating subsequent segmentation and identification processes. In addition, to tackle the issue of interclass similarity and intra-class variability, we pre-train our identification network by using a supervised contrastive learning method. To further optimize the identification results, we estimated the uncertainty of the classification network and utilized the message fusion module to combine the uncertainty scores, while aggregating global information about the spine. Our method achieves state-of-the-art results on the VerSe19 and VerSe20 challenge benchmarks. Additionally, our approach demonstrates outstanding generalization performance on an collected dataset containing a wide range of abnormal cases.
Paper Structure (28 sections, 7 equations, 11 figures, 3 tables)

This paper contains 28 sections, 7 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Example illustration of challenges encountered in vertebrae identification. a) The first and second vertebrae have different FOVs(cervical and lumbar). And there are differences in vertebrae posture between the second and third images. b) The shapes of adjacent vertebrae frequently exhibit similarities. As depicted in the left picture, it is difficult to distinguish whether these vertebrae are L1-L4 or T12-L3. Meanwhile, the spines belonging to the same category (masked by white boxes in the right two figures) also have certain variations. These two characteristics makes the process of extracting features of a specific category more difficult.
  • Figure 2: An overview of SpineCLUE. The three consecutive stages are connected by arrows. The scissors icon indicates the use of localization information to crop finely bounding boxes suitable for segmentation to obtain segmentation masks that include transverse and spinous processes. For localization, the dashed box on the left shows the process of pre-training and the main network used. The right dashed box shows the flow of fine-tuning, uncertainty information estimation and message fusion. In this case, the spark icon indicates that the parameters are trained without freezing the gradient of the network. And the snowflake icon indicates freezing the gradient and not updating the weights of the network.
  • Figure 3: Overview of uncertainty estimation module. The prediction distribution is estimated by sampling the confidence matrix $N$ times. Uncertainty scores are calculated by computing the entropy of the prediction distribution $p$.
  • Figure 4: Architecture of the uncertainty message fusion module. The uncertainty message fusion module utilizes the confidence matrices $(C_0^{t},\;C_2^{t})$ and uncertainty information $u$ of the correctly identified vertebrae $s_0$ and $s_2$ to correct the pre-identification results of $s_1$.
  • Figure 5: Qualitative results of low-resolution CT in sagittal plane. The pictures are sagittal slices of spine covered with the corresponding vertebrae masks, and the four sub-figures labeled as a, b, c, d in the above figure represent: ground truth, Payer et al.payer2020coarse, Meng et al.meng2023vertebrae, and SpineCLUE respectively. Vertebrae masks are indicated using different colors and are annotated on the images. In this subjected image, the range of the ground truth is from T12 to L5, excluding the sacrum (S1).
  • ...and 6 more figures