Automatic hip osteoarthritis grading with uncertainty estimation from computed tomography using digitally-reconstructed radiographs
Masachika Masuda, Mazen Soufi, Yoshito Otake, Keisuke Uemura, Sotaro Kono, Kazuma Takashima, Hidetoshi Hamada, Yi Gu, Masaki Takao, Seiji Okada, Nobuhiko Sugano, Yoshinobu Sato
TL;DR
This work tackles automated grading of hip osteoarthritis severity by leveraging CT-derived digitally-reconstructed radiographs (DRRs) to represent disease progression through Crowe and Kellgren-Lawrence (KL) grades. It evaluates three architectures (Vision Transformer, VGG, DenseNet) in classification and regression settings, testing both combined (seven-class) and separated (Crowe/KL) labeling schemes, and incorporating Monte-Carlo dropout to estimate model uncertainty. The approach is validated on 394 DRRs from 197 patients (internal) with external testing on 104 DRRs from 52 patients, showing high one-neighbor accuracy (ONCA > 0.90) and moderate exact-class accuracy (ECA around 0.65–0.66) in the internal dataset; external results are lower but informative due to distribution shifts in severe cases. Importantly, model uncertainty correlates with prediction errors, suggesting uncertainty estimates can serve as a surrogate for grading reliability and guide human review in large-scale CT databases; code will be publicly released.
Abstract
Progression of hip osteoarthritis (hip OA) leads to pain and disability, likely leading to surgical treatment such as hip arthroplasty at the terminal stage. The severity of hip OA is often classified using the Crowe and Kellgren-Lawrence (KL) classifications. However, as the classification is subjective, we aimed to develop an automated approach to classify the disease severity based on the two grades using digitally-reconstructed radiographs (DRRs) from CT images. Automatic grading of the hip OA severity was performed using deep learning-based models. The models were trained to predict the disease grade using two grading schemes, i.e., predicting the Crowe and KL grades separately, and predicting a new ordinal label combining both grades and representing the disease progression of hip OA. The models were trained in classification and regression settings. In addition, the model uncertainty was estimated and validated as a predictor of classification accuracy. The models were trained and validated on a database of 197 hip OA patients, and externally validated on 52 patients. The model accuracy was evaluated using exact class accuracy (ECA), one-neighbor class accuracy (ONCA), and balanced accuracy.The deep learning models produced a comparable accuracy of approximately 0.65 (ECA) and 0.95 (ONCA) in the classification and regression settings. The model uncertainty was significantly larger in cases with large classification errors (P<6e-3). In this study, an automatic approach for grading hip OA severity from CT images was developed. The models have shown comparable performance with high ONCA, which facilitates automated grading in large-scale CT databases and indicates the potential for further disease progression analysis. Classification accuracy was correlated with the model uncertainty, which would allow for the prediction of classification errors.
