Table of Contents
Fetching ...

Evaluating Explanatory Capabilities of Machine Learning Models in Medical Diagnostics: A Human-in-the-Loop Approach

José Bobes-Bascarán, Eduardo Mosqueira-Rey, Ángel Fernández-Leal, Elena Hernández-Pereira, David Alonso-Ríos, Vicente Moret-Bonillo, Israel Figueirido-Arnoso, Yolanda Vidal-Ínsua

TL;DR

This study investigates how well machine learning models can explain their decisions in a medical diagnostic context, specifically pancreatic cancer, by integrating Human-in-the-Loop knowledge and NCCN guidelines. It compares Decision Trees, Random Forests, and XGBoost using three feature sets (recommended, maximum, minimum) derived from expert input and domain guidelines, and evaluates explanations with model-specific and model-agnostic XAI methods (MDI, MDA, SHAP, LIME). A weighted Jaccard similarity metric is introduced to quantify how well model explanations align with expert and guideline expectations. Results show modest predictive accuracy due to limited data, with the minimum feature set often yielding best performance; explainability analyses reveal reasonable agreement across methods and highlight that simpler models (DT) provide sharper, more consistent explanations while more complex models (XGBoost) distribute importance across more features. The paper argues for using explainability metrics alongside accuracy to select models that are both trustworthy and aligned with domain knowledge, suggesting future work on robust, domain-grounded explainability measures in healthcare.

Abstract

This paper presents a comprehensive study on the evaluation of explanatory capabilities of machine learning models, with a focus on Decision Trees, Random Forest and XGBoost models using a pancreatic cancer dataset. We use Human-in-the-Loop related techniques and medical guidelines as a source of domain knowledge to establish the importance of the different features that are relevant to establish a pancreatic cancer treatment. These features are not only used as a dimensionality reduction approach for the machine learning models, but also as way to evaluate the explainability capabilities of the different models using agnostic and non-agnostic explainability techniques. To facilitate interpretation of explanatory results, we propose the use of similarity measures such as the Weighted Jaccard Similarity coefficient. The goal is to not only select the best performing model but also the one that can best explain its conclusions and aligns with human domain knowledge.

Evaluating Explanatory Capabilities of Machine Learning Models in Medical Diagnostics: A Human-in-the-Loop Approach

TL;DR

This study investigates how well machine learning models can explain their decisions in a medical diagnostic context, specifically pancreatic cancer, by integrating Human-in-the-Loop knowledge and NCCN guidelines. It compares Decision Trees, Random Forests, and XGBoost using three feature sets (recommended, maximum, minimum) derived from expert input and domain guidelines, and evaluates explanations with model-specific and model-agnostic XAI methods (MDI, MDA, SHAP, LIME). A weighted Jaccard similarity metric is introduced to quantify how well model explanations align with expert and guideline expectations. Results show modest predictive accuracy due to limited data, with the minimum feature set often yielding best performance; explainability analyses reveal reasonable agreement across methods and highlight that simpler models (DT) provide sharper, more consistent explanations while more complex models (XGBoost) distribute importance across more features. The paper argues for using explainability metrics alongside accuracy to select models that are both trustworthy and aligned with domain knowledge, suggesting future work on robust, domain-grounded explainability measures in healthcare.

Abstract

This paper presents a comprehensive study on the evaluation of explanatory capabilities of machine learning models, with a focus on Decision Trees, Random Forest and XGBoost models using a pancreatic cancer dataset. We use Human-in-the-Loop related techniques and medical guidelines as a source of domain knowledge to establish the importance of the different features that are relevant to establish a pancreatic cancer treatment. These features are not only used as a dimensionality reduction approach for the machine learning models, but also as way to evaluate the explainability capabilities of the different models using agnostic and non-agnostic explainability techniques. To facilitate interpretation of explanatory results, we propose the use of similarity measures such as the Weighted Jaccard Similarity coefficient. The goal is to not only select the best performing model but also the one that can best explain its conclusions and aligns with human domain knowledge.
Paper Structure (38 sections, 2 equations, 7 figures, 14 tables, 1 algorithm)

This paper contains 38 sections, 2 equations, 7 figures, 14 tables, 1 algorithm.

Figures (7)

  • Figure 1: Calling scheme between the different processes (PANCs).
  • Figure 2: Simplified graph of diagnostic decisions involving chemotherapy treatment.
  • Figure 3: Decision tree for the minimum set of features.
  • Figure 4: MDI for the DT with the minimum set of features.
  • Figure 5: MDA for the DT with the minimum set of features.
  • ...and 2 more figures