Table of Contents
Fetching ...

Explaining Human Activity Recognition with SHAP: Validating Insights with Perturbation and Quantitative Measures

Felix Tempel, Espen Alexander F. Ihlen, Lars Adde, Inga Strümke

TL;DR

This work integrates SHAP explanations with Graph Convolutional Networks for skeleton-based Human Activity Recognition, introducing ShapGCN to attribute feature contributions at the primary input level (J,V,B,A). A novel edge-matrix perturbation strategy tests the faithfulness of SHAP explanations by selectively masking influential body key points, quantified through PGI and PGU alongside standard metrics. The approach is validated on two real-world datasets: CP infant movement data and NTU RGB+D 60, showing that SHAP-identified key points exert the greatest influence on accuracy, specificity, and sensitivity, thereby enabling more interpretable, trustworthy HAR models for high-stakes domains like healthcare. The study discusses practical implications, computational costs, and potential extensions to biomarker discovery and real-time or broader-domain HAR applications.

Abstract

In Human Activity Recognition (HAR), understanding the intricacy of body movements within high-risk applications is essential. This study uses SHapley Additive exPlanations (SHAP) to explain the decision-making process of Graph Convolution Networks (GCNs) when classifying activities with skeleton data. We employ SHAP to explain two real-world datasets: one for cerebral palsy (CP) classification and the widely used NTU RGB+D 60 action recognition dataset. To test the explanation, we introduce a novel perturbation approach that modifies the model's edge importance matrix, allowing us to evaluate the impact of specific body key points on prediction outcomes. To assess the fidelity of our explanations, we employ informed perturbation, targeting body key points identified as important by SHAP and comparing them against random perturbation as a control condition. This perturbation enables a judgment on whether the body key points are truly influential or non-influential based on the SHAP values. Results on both datasets show that body key points identified as important through SHAP have the largest influence on the accuracy, specificity, and sensitivity metrics. Our findings highlight that SHAP can provide granular insights into the input feature contribution to the prediction outcome of GCNs in HAR tasks. This demonstrates the potential for more interpretable and trustworthy models in high-stakes applications like healthcare or rehabilitation.

Explaining Human Activity Recognition with SHAP: Validating Insights with Perturbation and Quantitative Measures

TL;DR

This work integrates SHAP explanations with Graph Convolutional Networks for skeleton-based Human Activity Recognition, introducing ShapGCN to attribute feature contributions at the primary input level (J,V,B,A). A novel edge-matrix perturbation strategy tests the faithfulness of SHAP explanations by selectively masking influential body key points, quantified through PGI and PGU alongside standard metrics. The approach is validated on two real-world datasets: CP infant movement data and NTU RGB+D 60, showing that SHAP-identified key points exert the greatest influence on accuracy, specificity, and sensitivity, thereby enabling more interpretable, trustworthy HAR models for high-stakes domains like healthcare. The study discusses practical implications, computational costs, and potential extensions to biomarker discovery and real-time or broader-domain HAR applications.

Abstract

In Human Activity Recognition (HAR), understanding the intricacy of body movements within high-risk applications is essential. This study uses SHapley Additive exPlanations (SHAP) to explain the decision-making process of Graph Convolution Networks (GCNs) when classifying activities with skeleton data. We employ SHAP to explain two real-world datasets: one for cerebral palsy (CP) classification and the widely used NTU RGB+D 60 action recognition dataset. To test the explanation, we introduce a novel perturbation approach that modifies the model's edge importance matrix, allowing us to evaluate the impact of specific body key points on prediction outcomes. To assess the fidelity of our explanations, we employ informed perturbation, targeting body key points identified as important by SHAP and comparing them against random perturbation as a control condition. This perturbation enables a judgment on whether the body key points are truly influential or non-influential based on the SHAP values. Results on both datasets show that body key points identified as important through SHAP have the largest influence on the accuracy, specificity, and sensitivity metrics. Our findings highlight that SHAP can provide granular insights into the input feature contribution to the prediction outcome of GCNs in HAR tasks. This demonstrates the potential for more interpretable and trustworthy models in high-stakes applications like healthcare or rehabilitation.

Paper Structure

This paper contains 28 sections, 8 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of the proposed XAI pipeline for HAR - ShapGCN. The features computed from the skeleton data are fed into a pre-trained model, followed by the explanation through ShapGCN to determine the feature attributions on the primary input features. The explanation $\phi$ is validated with the proposed quantitative metrics $\mathcal{M}$ and interpreted visually. Based on $\phi$, the perturbed edge matrix $\mathbf{E}^{\prime}_n$ is constructed, and the model architecture is updated. Afterward, $\mathcal{D}_{val}$ is inferred on the updated model architecture.
  • Figure 2: Skeletons of the NTU RGB+D and CP dataset and the recording environment. (\ref{['fig:sub1']}) The NTU RGB+D skeleton with $n=19$ body key points. (\ref{['fig:sub2']}) The CP skeleton with $n=29$ body key points adds finer detail in the toes and the head region. (\ref{['fig:sub3']}) The recording environment for the CP dataset.
  • Figure 3: SHAP values of the primary input features. The figures show the SHAP value on the $x$-axis and the corresponding body key point on the $y$-axis, sorted in descending order of importance. The color of the dots indicates the feature value, with red corresponding to a high value, while blue is linked to a low feature value. The values are mean aggregated over the individual window time frames ($t$) and belong to class 1 (CP).
  • Figure 4: Local SHAP values of the combined input features. The features ($J, V, B,$ and $A$) are added together and shown for five different windows of one infant with CP. The size and color of the dots indicate the value of the summed SHAP values. SHAP assigns the importance to the knees in window 10 and to the toes within windows 20, 30, and 40. Whereas a negative importance to CP is assigned to the hand region in windows 1 and 10.
  • Figure 5: SHAP values of the primary input features. The figures show the SHAP value on the $x$-axis and the corresponding body key point on the $y$-axis, sorted in descending order of importance. The color of the dots indicates the feature value, with red corresponding to a high value, while blue is linked to a low feature value. The values are mean aggregated over the individuals performing the action and belong to class 6 (pick up).
  • ...and 2 more figures