Table of Contents
Fetching ...

Choose Your Explanation: A Comparison of SHAP and GradCAM in Human Activity Recognition

Felix Tempel, Daniel Groos, Espen Alexander F. Ihlen, Lars Adde, Inga Strümke

TL;DR

The paper tackles explainability in graph-based human activity recognition by comparing SHAP and Grad-CAM on skeleton-based HAR across two real-world datasets. It provides both qualitative visualizations and quantitative perturbation analyses to evaluate how each method attributes importance to input features and body joints. The findings show SHAP delivers detailed feature-level attributions, while Grad-CAM offers quicker, spatial explanations, with notable differences in emphasis across networks and datasets; the authors argue for using these methods in a complementary fashion. This work informs how to choose and potentially hybrid XAI approaches for HAR in healthcare contexts, where trust and actionable insights are critical.

Abstract

Explaining machine learning (ML) models using eXplainable AI (XAI) techniques has become essential to make them more transparent and trustworthy. This is especially important in high-stakes domains like healthcare, where understanding model decisions is critical to ensure ethical, sound, and trustworthy outcome predictions. However, users are often confused about which explanability method to choose for their specific use case. We present a comparative analysis of widely used explainability methods, Shapley Additive Explanations (SHAP) and Gradient-weighted Class Activation Mapping (Grad-CAM), within the domain of human activity recognition (HAR) utilizing graph convolutional networks (GCNs). By evaluating these methods on skeleton-based data from two real-world datasets, including a healthcare-critical cerebral palsy (CP) case, this study provides vital insights into both approaches' strengths, limitations, and differences, offering a roadmap for selecting the most appropriate explanation method based on specific models and applications. We quantitatively and quantitatively compare these methods, focusing on feature importance ranking, interpretability, and model sensitivity through perturbation experiments. While SHAP provides detailed input feature attribution, Grad-CAM delivers faster, spatially oriented explanations, making both methods complementary depending on the application's requirements. Given the importance of XAI in enhancing trust and transparency in ML models, particularly in sensitive environments like healthcare, our research demonstrates how SHAP and Grad-CAM could complement each other to provide more interpretable and actionable model explanations.

Choose Your Explanation: A Comparison of SHAP and GradCAM in Human Activity Recognition

TL;DR

The paper tackles explainability in graph-based human activity recognition by comparing SHAP and Grad-CAM on skeleton-based HAR across two real-world datasets. It provides both qualitative visualizations and quantitative perturbation analyses to evaluate how each method attributes importance to input features and body joints. The findings show SHAP delivers detailed feature-level attributions, while Grad-CAM offers quicker, spatial explanations, with notable differences in emphasis across networks and datasets; the authors argue for using these methods in a complementary fashion. This work informs how to choose and potentially hybrid XAI approaches for HAR in healthcare contexts, where trust and actionable insights are critical.

Abstract

Explaining machine learning (ML) models using eXplainable AI (XAI) techniques has become essential to make them more transparent and trustworthy. This is especially important in high-stakes domains like healthcare, where understanding model decisions is critical to ensure ethical, sound, and trustworthy outcome predictions. However, users are often confused about which explanability method to choose for their specific use case. We present a comparative analysis of widely used explainability methods, Shapley Additive Explanations (SHAP) and Gradient-weighted Class Activation Mapping (Grad-CAM), within the domain of human activity recognition (HAR) utilizing graph convolutional networks (GCNs). By evaluating these methods on skeleton-based data from two real-world datasets, including a healthcare-critical cerebral palsy (CP) case, this study provides vital insights into both approaches' strengths, limitations, and differences, offering a roadmap for selecting the most appropriate explanation method based on specific models and applications. We quantitatively and quantitatively compare these methods, focusing on feature importance ranking, interpretability, and model sensitivity through perturbation experiments. While SHAP provides detailed input feature attribution, Grad-CAM delivers faster, spatially oriented explanations, making both methods complementary depending on the application's requirements. Given the importance of XAI in enhancing trust and transparency in ML models, particularly in sensitive environments like healthcare, our research demonstrates how SHAP and Grad-CAM could complement each other to provide more interpretable and actionable model explanations.

Paper Structure

This paper contains 20 sections, 6 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Model architecture with the locations where the gradients are obtained. The model consists of four input branches ($J, V, B, A$) processing different feature divisions obtained from the skeleton video. These features are fused later into a common main branch followed by the classifier. Our reference gradients $\nabla$ for the experiments are extracted after the attention activation (Att.) and after the temporal convolutional layer (TCN) in the main branch.
  • Figure 2: Comparison of the spatial explanations on the body key points with Grad-CAM for the TCN convolutional layer and attention activation layer, and SHAP for class 6 (pick up) of the NTU RGB+D dataset.
  • Figure 3: Comparison of the spatial explanations on the body key points with Grad-CAM for the TCN convolutional and attention activation layers, and SHAP for an infant with CP. The size and color indicate the activation in the respective body key point.
  • Figure 4: Perturbation experiment results with the three classes from the NTU RGB+D dataset. The first row shows the drop in accuracy if important body key points are perturbed. The second row shows the results when unimportant body key points are perturbed. Both XAI methods perform better than randomly perturbing the body key points, approving their correctness.
  • Figure 5: Perturbation results on the CP dataset. Both XAI methods perform better than randomly perturbing body key points, indicating their reliability. The results when the Grad-CAM values are computed after the frame attention layer perform poorly, indicating that this layer is not well suited for obtaining spatial information.
  • ...and 2 more figures