Table of Contents
Fetching ...

Evaluating Explainable AI Methods in Deep Learning Models for Early Detection of Cerebral Palsy

Kimji N. Pellano, Inga Strümke, Daniel Groos, Lars Adde, Espen Alexander F. Ihlen

TL;DR

This work demonstrates that XAI methods can offer reliable and stable explanations for CP prediction models and how the explanations can enhance the understanding of specific movement patterns characterizing healthy and pathological development.

Abstract

Early detection of Cerebral Palsy (CP) is crucial for effective intervention and monitoring. This paper tests the reliability and applicability of Explainable AI (XAI) methods using a deep learning method that predicts CP by analyzing skeletal data extracted from video recordings of infant movements. Specifically, we use XAI evaluation metrics -- namely faithfulness and stability -- to quantitatively assess the reliability of Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM) in this specific medical application. We utilize a unique dataset of infant movements and apply skeleton data perturbations without distorting the original dynamics of the infant movements. Our CP prediction model utilizes an ensemble approach, so we evaluate the XAI metrics performances for both the overall ensemble and the individual models. Our findings indicate that both XAI methods effectively identify key body points influencing CP predictions and that the explanations are robust against minor data perturbations. Grad-CAM significantly outperforms CAM in the RISv metric, which measures stability in terms of velocity. In contrast, CAM performs better in the RISb metric, which relates to bone stability, and the RRS metric, which assesses internal representation robustness. Individual models within the ensemble show varied results, and neither CAM nor Grad-CAM consistently outperform the other, with the ensemble approach providing a representation of outcomes from its constituent models.

Evaluating Explainable AI Methods in Deep Learning Models for Early Detection of Cerebral Palsy

TL;DR

This work demonstrates that XAI methods can offer reliable and stable explanations for CP prediction models and how the explanations can enhance the understanding of specific movement patterns characterizing healthy and pathological development.

Abstract

Early detection of Cerebral Palsy (CP) is crucial for effective intervention and monitoring. This paper tests the reliability and applicability of Explainable AI (XAI) methods using a deep learning method that predicts CP by analyzing skeletal data extracted from video recordings of infant movements. Specifically, we use XAI evaluation metrics -- namely faithfulness and stability -- to quantitatively assess the reliability of Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM) in this specific medical application. We utilize a unique dataset of infant movements and apply skeleton data perturbations without distorting the original dynamics of the infant movements. Our CP prediction model utilizes an ensemble approach, so we evaluate the XAI metrics performances for both the overall ensemble and the individual models. Our findings indicate that both XAI methods effectively identify key body points influencing CP predictions and that the explanations are robust against minor data perturbations. Grad-CAM significantly outperforms CAM in the RISv metric, which measures stability in terms of velocity. In contrast, CAM performs better in the RISb metric, which relates to bone stability, and the RRS metric, which assesses internal representation robustness. Individual models within the ensemble show varied results, and neither CAM nor Grad-CAM consistently outperform the other, with the ensemble approach providing a representation of outcomes from its constituent models.
Paper Structure (16 sections, 8 equations, 3 figures, 9 tables)

This paper contains 16 sections, 8 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Overview of CP prediction ensemble pipeline, showing (\ref{['fig:flow']}) the variables for XAI metrics evaluation, and (\ref{['fig:ensemble']}) the processing steps for each model in the CP prediction ensemble.
  • Figure 2: Sample visualization of attribution scores coming from the ensemble model's XAI methods tested using the same video. From left to right: CAM, Grad-CAM, and random method. Green indicates low attribution scores, yellow indicates moderate scores, orange shows high scores, and red signifies very high scores relative to the defined threshold score.
  • Figure 3: A line plot showing the metrics test results. PGI (↑) indicates that higher scores denote better performance, while PGU (↓) indicates that lower scores denote better performance. For stability metrics (RISP, RISV, RISB, ROS, RRS), values closer to zero are optimal. ROS and RRS are plotted on logarithmic scales to better display a wide range of values. Each plot also shows the p-value from the unpaired t-test between CAM and Grad-CAM for easier reference on the statistical significance of their difference.