Table of Contents
Fetching ...

ProtoMedX: Towards Explainable Multi-Modal Prototype Learning for Bone Health Classification

Alvaro Lopez Pellicer, Andre Mariucci, Plamen Angelov, Marwan Bukhari, Jemma G. Kerns

TL;DR

ProtoMedX introduces an explainable, prototype-based, multi-modal framework for bone health classification that fuses lumbar DEXA imagery with patient records. By employing dual prototype spaces and cross-modal fusion, it achieves state-of-the-art performance (up to 89.8% accuracy on three-class bone health and 91.2% sensitivity for Normal vs Abnormal) while providing inherently interpretable predictions through prototype similarity, confidence measures, and clinical feature analyses. The method uses multi-task learning with T-score regression to capture continuous bone density, and validation on 4,160 NHS patients demonstrates robust performance and clinically meaningful explanations. This work advances deployable explainable AI in osteoporosis diagnostics, aligning with regulatory expectations and offering clinicians transparent decision support with case-based reasoning.

Abstract

Bone health studies are crucial in medical practice for the early detection and treatment of Osteopenia and Osteoporosis. Clinicians usually make a diagnosis based on densitometry (DEXA scans) and patient history. The applications of AI in this field are ongoing research. Most successful methods rely on deep learning models that use vision alone (DEXA/X-ray imagery) and focus on prediction accuracy, while explainability is often disregarded and left to post hoc assessments of input contributions. We propose ProtoMedX, a multi-modal (multimodal) model that uses both DEXA scans of the lumbar spine and patient records. ProtoMedX's prototype-based architecture is explainable by design, which is crucial for medical applications, especially in the context of the upcoming EU AI Act, as it allows explicit analysis of model decisions, including incorrect ones. ProtoMedX demonstrates state-of-the-art performance in bone health classification while also providing explanations that can be visually understood by clinicians. Using a dataset of 4,160 real NHS patients, the proposed ProtoMedX achieves 87.58% accuracy in vision-only tasks and 89.8% in its multi-modal variant, both surpassing existing published methods.

ProtoMedX: Towards Explainable Multi-Modal Prototype Learning for Bone Health Classification

TL;DR

ProtoMedX introduces an explainable, prototype-based, multi-modal framework for bone health classification that fuses lumbar DEXA imagery with patient records. By employing dual prototype spaces and cross-modal fusion, it achieves state-of-the-art performance (up to 89.8% accuracy on three-class bone health and 91.2% sensitivity for Normal vs Abnormal) while providing inherently interpretable predictions through prototype similarity, confidence measures, and clinical feature analyses. The method uses multi-task learning with T-score regression to capture continuous bone density, and validation on 4,160 NHS patients demonstrates robust performance and clinically meaningful explanations. This work advances deployable explainable AI in osteoporosis diagnostics, aligning with regulatory expectations and offering clinicians transparent decision support with case-based reasoning.

Abstract

Bone health studies are crucial in medical practice for the early detection and treatment of Osteopenia and Osteoporosis. Clinicians usually make a diagnosis based on densitometry (DEXA scans) and patient history. The applications of AI in this field are ongoing research. Most successful methods rely on deep learning models that use vision alone (DEXA/X-ray imagery) and focus on prediction accuracy, while explainability is often disregarded and left to post hoc assessments of input contributions. We propose ProtoMedX, a multi-modal (multimodal) model that uses both DEXA scans of the lumbar spine and patient records. ProtoMedX's prototype-based architecture is explainable by design, which is crucial for medical applications, especially in the context of the upcoming EU AI Act, as it allows explicit analysis of model decisions, including incorrect ones. ProtoMedX demonstrates state-of-the-art performance in bone health classification while also providing explanations that can be visually understood by clinicians. Using a dataset of 4,160 real NHS patients, the proposed ProtoMedX achieves 87.58% accuracy in vision-only tasks and 89.8% in its multi-modal variant, both surpassing existing published methods.

Paper Structure

This paper contains 27 sections, 8 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Inconsistent quality of prototypes and heatmap localisation generated by ProtoPNet. Despite (1a) showing a clear, high-quality prototype, its heatmap (1b) exhibits poor localisation with diffuse activation. Conversely, a blurry prototype (2a) produces well-focused heatmap localisation (2b), revealing that prototype visual quality does not correlate with localisation accuracy.
  • Figure 2: Overview of ProtoMedX Architecture. Multi-modal prototype learning combines patient DEXA scans and clinical records via separate encoders, learns explainable vision and tabular prototypes, and fuses them in a joint prototype space. Classification and explanations derive from prototype similarity and case retrieval, enhancing clinical explainability.
  • Figure 3: T-SNE analysis of Fused Prototype feature space with 18 learned prototypes showing clear class separation.
  • Figure 4: T-SNE analysis of k-NN decision boundaries ($k=3$) in Fused Prototype feature space demonstrating how prototypes define diagnostic regions.
  • Figure 5: ProtoMedX Clinical Explanations. (1) Correct classification and (2) misclassification examples. Each panel includes: (a) prototype similarity with annotated clinical metadata, (b) model confidence and class voting distribution, and (c) clinical feature deviation analysis.