Table of Contents
Fetching ...

Fully Automated Deep Learning Based Glenoid Bone Loss Measurement and Severity Stratification on 3D CT in Shoulder Instability

Zhonghao Liu, Hanxue Gu, Qihang Li, Michael Fox, Jay M. Levin, Maciej A. Mazurowski, Brian C. Lau

TL;DR

This work introduces a fully automated deep-learning pipeline to quantify glenoid bone loss on 3D CT by segmenting the glenoid and humerus, detecting posterior–inferior rim points, and fitting a best-fit circle on an en-face view to compute bone loss as $(B/A)\times100\%$. The approach combines a transfer-learned segmentation backbone (TotalSegmentator/nnU-Net) with RimU-Net for rim-point prediction, and uses SVD-based en-face plane estimation followed by radial least-squares circle fitting, with radius tuned to 0.6955 of glenoid height. On 81 shoulders (60 train, 21 test), the pipeline achieves strong agreement with expert consensus (e.g., ICC $=0.838$; MAE $=4.28\%$) and high sensitivity in discriminating low and high bone-loss subgroups, outperforming inter-reader baselines in several metrics. The method demonstrates robust performance for preoperative planning in shoulder instability, particularly for extreme bone-loss cases, and the authors provide their code and data at GitHub for reproducibility. Limitations include small subgroup sizes and lack of external validation, with future work aimed at external validation, MRI integration, and extending to additional bone-loss patterns.

Abstract

To develop and validate a fully automated, deep-learning pipeline for measuring glenoid bone loss on 3D CT scans using linear-based, en-face view, and best-circle method. Shoulder CT scans of 81 patients were retrospectively collected between January 2013 and March 2023. Our algorithm consists of three main stages: (1) Segmentation, where we developed a U-Net to automatically segment the glenoid and humerus; (2) anatomical landmark detection, where a second network predicts glenoid rim points; and (3) geometric fitting, where we applied a principal component analysis (PCA), projection, and circle fitting to compute the percentage of bone loss. The performance of the pipeline was evaluated using DSC for segmentation and MAE and ICC for bone-loss measurement; intermediate outputs (rim point sets and en-face view) were also assessed. Automated measurements showed strong agreement with consensus readings, exceeding surgeon-to-surgeon consistency (ICC 0.84 vs 0.78 for all patients; ICC 0.71 vs 0.63 for low bone loss; ICC 0.83 vs 0.21 for high bone loss; P < 0.001). For the classification task of assigning each patient to different bone loss severity subgroups, the pipeline's sensitivity was 71.4% for the low-severity group and 85.7% for the high-severity group, with no instances of misclassifying low as high or vice versa. A fully automated, deep learning-based pipeline for glenoid bone-loss measurement on CT scans can be a clinically reliable tool to assist clinicians with preoperative planning for shoulder instability. We are releasing our model and dataset at https://github.com/Edenliu1/Auto-Glenoid-Measurement-DL-Pipeline .

Fully Automated Deep Learning Based Glenoid Bone Loss Measurement and Severity Stratification on 3D CT in Shoulder Instability

TL;DR

This work introduces a fully automated deep-learning pipeline to quantify glenoid bone loss on 3D CT by segmenting the glenoid and humerus, detecting posterior–inferior rim points, and fitting a best-fit circle on an en-face view to compute bone loss as . The approach combines a transfer-learned segmentation backbone (TotalSegmentator/nnU-Net) with RimU-Net for rim-point prediction, and uses SVD-based en-face plane estimation followed by radial least-squares circle fitting, with radius tuned to 0.6955 of glenoid height. On 81 shoulders (60 train, 21 test), the pipeline achieves strong agreement with expert consensus (e.g., ICC ; MAE ) and high sensitivity in discriminating low and high bone-loss subgroups, outperforming inter-reader baselines in several metrics. The method demonstrates robust performance for preoperative planning in shoulder instability, particularly for extreme bone-loss cases, and the authors provide their code and data at GitHub for reproducibility. Limitations include small subgroup sizes and lack of external validation, with future work aimed at external validation, MRI integration, and extending to additional bone-loss patterns.

Abstract

To develop and validate a fully automated, deep-learning pipeline for measuring glenoid bone loss on 3D CT scans using linear-based, en-face view, and best-circle method. Shoulder CT scans of 81 patients were retrospectively collected between January 2013 and March 2023. Our algorithm consists of three main stages: (1) Segmentation, where we developed a U-Net to automatically segment the glenoid and humerus; (2) anatomical landmark detection, where a second network predicts glenoid rim points; and (3) geometric fitting, where we applied a principal component analysis (PCA), projection, and circle fitting to compute the percentage of bone loss. The performance of the pipeline was evaluated using DSC for segmentation and MAE and ICC for bone-loss measurement; intermediate outputs (rim point sets and en-face view) were also assessed. Automated measurements showed strong agreement with consensus readings, exceeding surgeon-to-surgeon consistency (ICC 0.84 vs 0.78 for all patients; ICC 0.71 vs 0.63 for low bone loss; ICC 0.83 vs 0.21 for high bone loss; P < 0.001). For the classification task of assigning each patient to different bone loss severity subgroups, the pipeline's sensitivity was 71.4% for the low-severity group and 85.7% for the high-severity group, with no instances of misclassifying low as high or vice versa. A fully automated, deep learning-based pipeline for glenoid bone-loss measurement on CT scans can be a clinically reliable tool to assist clinicians with preoperative planning for shoulder instability. We are releasing our model and dataset at https://github.com/Edenliu1/Auto-Glenoid-Measurement-DL-Pipeline .

Paper Structure

This paper contains 22 sections, 1 equation, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Pipeline for automated glenoid bone loss measurement. (1) The segmentation model delineates the glenoid articular surface; (2) RimU-Net reads the segmentation mask and predicts rim points; (3) singular value decomposition (SVD) estimates the rim-plane normal, defining the en face view; (4) the segmentation mask and rim points are projected onto the en-face plane; and (5) a 2D circle is fitted to the rim points. Bone loss is computed as $100\times B/A$, where $B$ is the glenoid-defect length and $A$ is the fitted-circle diameter.
  • Figure 2: Scatter plot comparing the algorithm-predicted en-face direction to the doctor baseline. Each point is one case. The X-axis is the error of one doctor's measurement to the consensus(two doctors' mean); the Y-axis is the predicted angular error to the same consensus. The dashed diagonal ($y{=}x$) marks parity with a typical single doctor; points below this line have lower error than a single doctor. The dotted line ($y{=}x{+}5^\circ$) indicates the pre-specified non-inferiority margin; points in the gray region at or below this line are within the allowable tolerance.
  • Figure 3: Optimization of Diameter Ratio for Glenoid Bone Loss Prediction on Training Dataset. The X-axis represents the diameter ratio (fitted circle diameter divided by glenoid height) tested across 11 values from 0.65 to 0.75; the Y-axis represents the mean absolute error (MAE, %) in bone loss prediction. The purple star ($\star$) marks the optimal diameter ratio of 0.6955, which achieved MAE = 6.24%. For comparison, the unconstrained fitting method achieved MAE = 8.81%.
  • Figure 4: Bland--Altman plots showing agreement between two doctors alongside agreement between the algorithm and consensus measurements.
  • Figure 5: Confusion matrix for the algorithm-predicted glenoid bone loss measurement against the doctor's consensus. The predicted class axis represents the algorithm's classification, and the true class represents the doctor's consensus. Accuracy is centered in each box, and the number of samples is categorized below.
  • ...and 2 more figures