Table of Contents
Fetching ...

Stacked Ensemble of Fine-Tuned CNNs for Knee Osteoarthritis Severity Grading

Adarsh Gupta, Japleen Kaur, Tanvi Doshi, Teena Sharma, Nishchal K. Verma, Shantaram Vasikarla

TL;DR

This work tackles automatic KOA severity grading using the Kellgren-Lawrence system by building a stacked ensemble of fine-tuned CNN backbones (MobileNetV2, YOLOv8, DenseNet201) with CatBoost as the meta-learner. The approach employs a class-weighted loss to address dataset imbalance and a three-stage pipeline of preprocessing, fine-tuning, and stacking, validated on the OAI knee X-ray dataset. Experimental results show strong binary detection performance and competitive multiclass grading, with the ensemble outperforming individual CNNs and several prior methods. The authors discuss meta-learner comparisons and suggest future enhancements via bagging, feature-based meta-learning, and transformer-based architectures. Overall, the method demonstrates improved KOA detection and grading speed and reliability, aiding clinical decision-making.

Abstract

Knee Osteoarthritis (KOA) is a musculoskeletal condition that can cause significant limitations and impairments in daily activities, especially among older individuals. To evaluate the severity of KOA, typically, X-ray images of the affected knee are analyzed, and a grade is assigned based on the Kellgren-Lawrence (KL) grading system, which classifies KOA severity into five levels, ranging from 0 to 4. This approach requires a high level of expertise and time and is susceptible to subjective interpretation, thereby introducing potential diagnostic inaccuracies. To address this problem a stacked ensemble model of fine-tuned Convolutional Neural Networks (CNNs) was developed for two classification tasks: a binary classifier for detecting the presence of KOA, and a multiclass classifier for precise grading across the KL spectrum. The proposed stacked ensemble model consists of a diverse set of pre-trained architectures, including MobileNetV2, You Only Look Once (YOLOv8), and DenseNet201 as base learners and Categorical Boosting (CatBoost) as the meta-learner. This proposed model had a balanced test accuracy of 73% in multiclass classification and 87.5% in binary classification, which is higher than previous works in extant literature.

Stacked Ensemble of Fine-Tuned CNNs for Knee Osteoarthritis Severity Grading

TL;DR

This work tackles automatic KOA severity grading using the Kellgren-Lawrence system by building a stacked ensemble of fine-tuned CNN backbones (MobileNetV2, YOLOv8, DenseNet201) with CatBoost as the meta-learner. The approach employs a class-weighted loss to address dataset imbalance and a three-stage pipeline of preprocessing, fine-tuning, and stacking, validated on the OAI knee X-ray dataset. Experimental results show strong binary detection performance and competitive multiclass grading, with the ensemble outperforming individual CNNs and several prior methods. The authors discuss meta-learner comparisons and suggest future enhancements via bagging, feature-based meta-learning, and transformer-based architectures. Overall, the method demonstrates improved KOA detection and grading speed and reliability, aiding clinical decision-making.

Abstract

Knee Osteoarthritis (KOA) is a musculoskeletal condition that can cause significant limitations and impairments in daily activities, especially among older individuals. To evaluate the severity of KOA, typically, X-ray images of the affected knee are analyzed, and a grade is assigned based on the Kellgren-Lawrence (KL) grading system, which classifies KOA severity into five levels, ranging from 0 to 4. This approach requires a high level of expertise and time and is susceptible to subjective interpretation, thereby introducing potential diagnostic inaccuracies. To address this problem a stacked ensemble model of fine-tuned Convolutional Neural Networks (CNNs) was developed for two classification tasks: a binary classifier for detecting the presence of KOA, and a multiclass classifier for precise grading across the KL spectrum. The proposed stacked ensemble model consists of a diverse set of pre-trained architectures, including MobileNetV2, You Only Look Once (YOLOv8), and DenseNet201 as base learners and Categorical Boosting (CatBoost) as the meta-learner. This proposed model had a balanced test accuracy of 73% in multiclass classification and 87.5% in binary classification, which is higher than previous works in extant literature.

Paper Structure

This paper contains 14 sections, 13 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Workflow of the proposed methodology: Data Pre-processing, fine-tuning, and meta-learning.
  • Figure 2: The final model architecture which is fine-tuned on the OAI OAI dataset. The green layer represents the input image $I$. The yellow block represents the pre-trained CNN Backbone (\ref{['eq:cnn']}), the light green layer represents the vector output of 2D global average pooling (\ref{['eq:globalpool']}) followed by Dense Layer (red color) with ReLU activation (\ref{['eq:dense1']}). For multiclass classification (Grading), the last Dense Layer has 5 Nodes with Softmax activation, and for Binary Classification (Detection), the last Dense Layer has 1 Node with Sigmoid activation.
  • Figure 3: Class-wise distribution of the number of samples in the OAI OAI dataset. Here, 0, 1, 2, 3, and 4 represent ""*No Disease, ""*Doubtful, ""*Minimal, ""*Moderate, and ""*Severe classes, respectively.
  • Figure 4: Observed trends for DenseNet201 densenet fine-tuning for Multiclass Classification.
  • Figure 5: Observed trends for MobileNetV2 MobileNetv2 fine-tuning for Multiclass Classification.
  • ...and 2 more figures