SpineFM: Leveraging Foundation Models for Automatic Spine X-ray Segmentation
Samuel J. Simons, Bartłomiej W. Papież
TL;DR
SpineFM presents a novel pipeline for automatic vertebra segmentation in cervical and lumbar spine X-rays by integrating a foundation-model segmentation backbone (Medical-SAM-Adaptor) with an inductive, sequential vertebra localization strategy. The method starts from rough initial centroids derived via Mask R-CNN and refines vertebral masks with Med-SA, then propagates along the spine using a small neural network to predict subsequent vertebra centroids, validated by IoU and spine-end classification. Across NHANES II and CSXA datasets, SpineFM achieves state-of-the-art Dice scores (around 0.92–0.94) and high vertebra identification rates (approximately 97.8–99.6%), significantly outperforming prior approaches. This approach reduces data requirements through patch-based processing and full exploitation of spine regularity, offering robust automated segmentation with potential clinical impact and accessible code for replication.
Abstract
This paper introduces SpineFM, a novel pipeline that achieves state-of-the-art performance in the automatic segmentation and identification of vertebral bodies in cervical and lumbar spine radiographs. SpineFM leverages the regular geometry of the spine, employing a novel inductive process to sequentially infer the location of each vertebra along the spinal column. Vertebrae are segmented using Medical-SAM-Adaptor, a robust foundation model that diverges from commonly used CNN-based models. We achieved outstanding results on two publicly available spine X-Ray datasets, with successful identification of 97.8\% and 99.6\% of annotated vertebrae, respectively. Of which, our segmentation reached an average Dice of 0.942 and 0.921, surpassing previous state-of-the-art methods.
