Table of Contents
Fetching ...

MedConv: Convolutions Beat Transformers on Long-Tailed Bone Density Prediction

Xuyin Qi, Zeyu Zhang, Huazhan Zheng, Mingxi Chen, Numan Kutaiba, Ruth Lim, Cherie Chiang, Zi En Tham, Xuan Ren, Wenxin Zhang, Lei Zhang, Hao Zhang, Wenbing Lv, Guangzhen Yao, Renda Han, Kangsheng Wang, Mingyuan Li, Hongtao Mao, Yu Li, Zhibin Liao, Yang Zhao, Minh-Son To

TL;DR

MedConv addresses CT-based bone density prediction under long-tailed class distributions by replacing transformer architectures with a computationally efficient 3D CNN backbone (3D ResNet-50). It couples Balanced Cross-Entropy loss and post-hoc logit adjustment to improve minority-class performance and probability calibration, leveraging high-quality TotalSegmentator segmentation. On the AustinSpine dataset, MedConv achieves up to 21% accuracy and 20% ROC AUC improvements over prior methods, outperforming transformer baselines in accuracy, sensitivity, and specificity. The findings highlight the practical potential for clinical deployment, emphasizing segmentation quality and careful hyperparameter tuning as essential factors for robust, efficient CT-based bone-density assessment.

Abstract

Bone density prediction via CT scans to estimate T-scores is crucial, providing a more precise assessment of bone health compared to traditional methods like X-ray bone density tests, which lack spatial resolution and the ability to detect localized changes. However, CT-based prediction faces two major challenges: the high computational complexity of transformer-based architectures, which limits their deployment in portable and clinical settings, and the imbalanced, long-tailed distribution of real-world hospital data that skews predictions. To address these issues, we introduce MedConv, a convolutional model for bone density prediction that outperforms transformer models with lower computational demands. We also adapt Bal-CE loss and post-hoc logit adjustment to improve class balance. Extensive experiments on our AustinSpine dataset shows that our approach achieves up to 21% improvement in accuracy and 20% in ROC AUC over previous state-of-the-art methods.

MedConv: Convolutions Beat Transformers on Long-Tailed Bone Density Prediction

TL;DR

MedConv addresses CT-based bone density prediction under long-tailed class distributions by replacing transformer architectures with a computationally efficient 3D CNN backbone (3D ResNet-50). It couples Balanced Cross-Entropy loss and post-hoc logit adjustment to improve minority-class performance and probability calibration, leveraging high-quality TotalSegmentator segmentation. On the AustinSpine dataset, MedConv achieves up to 21% accuracy and 20% ROC AUC improvements over prior methods, outperforming transformer baselines in accuracy, sensitivity, and specificity. The findings highlight the practical potential for clinical deployment, emphasizing segmentation quality and careful hyperparameter tuning as essential factors for robust, efficient CT-based bone-density assessment.

Abstract

Bone density prediction via CT scans to estimate T-scores is crucial, providing a more precise assessment of bone health compared to traditional methods like X-ray bone density tests, which lack spatial resolution and the ability to detect localized changes. However, CT-based prediction faces two major challenges: the high computational complexity of transformer-based architectures, which limits their deployment in portable and clinical settings, and the imbalanced, long-tailed distribution of real-world hospital data that skews predictions. To address these issues, we introduce MedConv, a convolutional model for bone density prediction that outperforms transformer models with lower computational demands. We also adapt Bal-CE loss and post-hoc logit adjustment to improve class balance. Extensive experiments on our AustinSpine dataset shows that our approach achieves up to 21% improvement in accuracy and 20% in ROC AUC over previous state-of-the-art methods.

Paper Structure

This paper contains 18 sections, 2 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Visualization of segmentation results on CT images. The first column shows the original images. The second column represents the segmentation results from CTSpine1K deng2021ctspine1k. The third column displays the segmentation results from TotalSegmentator wasserthal2023totalsegmentator. Rows correspond to different anatomical planes: the sagittal plane (S) in the first row, the axial plane (A) in the second row, and the coronal plane (C) in the third row. The region highlighted in red corresponds to the L5 vertebra, which plays a crucial role in diagnosing conditions like osteoporosis.
  • Figure 2: Comparison between 3D ResNet and 2D ResNet architectures for volumetric medical data processing. The upper pipeline illustrates the 3D ResNet-based MedConv model, which leverages three-dimensional convolutions to capture spatial and contextual information across volumetric CT scans. The inclusion of Bal-CE Loss further refines the model's focus on imbalanced data distributions, ensuring accurate predictions for the L1 vertebra segmentation task. Conversely, the lower pipeline showcases the standard 2D ResNet approach, where slices are treated independently without spatial continuity across adjacent slices, potentially limiting performance in tasks requiring volumetric context. This figure highlights the architectural and methodological differences, emphasizing the advantages of 3D ResNet for tasks that demand structural and contextual understanding of medical images.
  • Figure 3: Architecture of the proposed MedConv model, based on a 3D ResNet-50 backbone. The model leverages the volumetric spatial representation capabilities of 3D convolutions, essential for accurate bone density estimation. Key methodologies include the use of Balanced Cross-Entropy (Bal-CE) loss and post-hoc logit adjustment with hyperparameters $\tau_1 = 1$ and $\tau_2 = 0.5$, which enhance class balance and calibration.
  • Figure 4: Long-tailed distribution of T-score classifications within the AustinSpine dataset.
  • Figure 5: Experiment pipeline for evaluating segmentation methods and their impact on downstream tasks. This flowchart illustrates the comparison between CTSpine1K deng2021ctspine1k and TotalSegmentator wasserthal2023totalsegmentator, two widely used segmentation algorithms. Both methods segment the L1 vertebra from input CT images, with the outputs subsequently processed by the MedConv module, followed by post-hoc logits optimized with balanced cross-entropy loss. TotalSegmentator was identified as the superior model, producing more robust and accurate segmentation results, which were selected as inputs for the MedConv module.
  • ...and 1 more figures