Table of Contents
Fetching ...

From Movements to Metrics: Evaluating Explainable AI Methods in Skeleton-Based Human Activity Recognition

Kimji N. Pellano, Inga Strümke, Espen Alexander F. Ihlen

TL;DR

The paper addresses the lack of validated XAI evaluation metrics for skeleton-based HAR by testing $PGI$/$PGU$ faithfulness and $RIS$/$ROS$/$RRS$ stability on CAM and Grad-CAM explanations produced by EfficientGCN on the NTU RGB+D-60 dataset. It introduces biomechanically constrained perturbations with perturbation radius $r$ (tested from $2.5$ to $80$ cm) to assess metric robustness while keeping human kinematics realistic. The key finding is that faithfulness can be unreliable for this model, whereas stability provides a more dependable measure; moreover, CAM and Grad-CAM yield nearly identical explanations, highlighting the need for more diverse XAI methods in skeleton HAR. The study underscores the practical need for developing domain-specific XAI metrics and methods to ensure trustworthy explanations in high-stakes HAR applications, and it advocates broader cross-model analyses to guide method selection.

Abstract

The advancement of deep learning in human activity recognition (HAR) using 3D skeleton data is critical for applications in healthcare, security, sports, and human-computer interaction. This paper tackles a well-known gap in the field, which is the lack of testing in the applicability and reliability of XAI evaluation metrics in the skeleton-based HAR domain. We have tested established XAI metrics namely faithfulness and stability on Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM) to address this problem. The study also introduces a perturbation method that respects human biomechanical constraints to ensure realistic variations in human movement. Our findings indicate that \textit{faithfulness} may not be a reliable metric in certain contexts, such as with the EfficientGCN model. Conversely, stability emerges as a more dependable metric when there is slight input data perturbations. CAM and Grad-CAM are also found to produce almost identical explanations, leading to very similar XAI metric performance. This calls for the need for more diversified metrics and new XAI methods applied in skeleton-based HAR.

From Movements to Metrics: Evaluating Explainable AI Methods in Skeleton-Based Human Activity Recognition

TL;DR

The paper addresses the lack of validated XAI evaluation metrics for skeleton-based HAR by testing / faithfulness and // stability on CAM and Grad-CAM explanations produced by EfficientGCN on the NTU RGB+D-60 dataset. It introduces biomechanically constrained perturbations with perturbation radius (tested from to cm) to assess metric robustness while keeping human kinematics realistic. The key finding is that faithfulness can be unreliable for this model, whereas stability provides a more dependable measure; moreover, CAM and Grad-CAM yield nearly identical explanations, highlighting the need for more diverse XAI methods in skeleton HAR. The study underscores the practical need for developing domain-specific XAI metrics and methods to ensure trustworthy explanations in high-stakes HAR applications, and it advocates broader cross-model analyses to guide method selection.

Abstract

The advancement of deep learning in human activity recognition (HAR) using 3D skeleton data is critical for applications in healthcare, security, sports, and human-computer interaction. This paper tackles a well-known gap in the field, which is the lack of testing in the applicability and reliability of XAI evaluation metrics in the skeleton-based HAR domain. We have tested established XAI metrics namely faithfulness and stability on Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM) to address this problem. The study also introduces a perturbation method that respects human biomechanical constraints to ensure realistic variations in human movement. Our findings indicate that \textit{faithfulness} may not be a reliable metric in certain contexts, such as with the EfficientGCN model. Conversely, stability emerges as a more dependable metric when there is slight input data perturbations. CAM and Grad-CAM are also found to produce almost identical explanations, leading to very similar XAI metric performance. This calls for the need for more diversified metrics and new XAI methods applied in skeleton-based HAR.
Paper Structure (15 sections, 8 equations, 5 figures, 2 tables)

This paper contains 15 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Illustration of perturbing a point P(x, y, z) in 3D space to a new position P'(x', y', z') using spherical coordinates. The perturbation magnitude is represented by $r$, with azimuthal angle $\theta$ and polar angle $\phi$.
  • Figure 2: The EfficientGCN pipeline song2022constructing showing the variables for calculating faithfulness and stability. Perturbation is performed in Data Preprocess stage.
  • Figure 3: Left to right: CAM, Grad-CAM, and baseline random attributions for a data instance in 'writing' (class 11), averaged for all frames and normalized. The color gradient denotes the score intensity: blue indicates 0, progressing to red which indicates a score of 1.
  • Figure 4: Evaluation metric outcomes for 'Writing' (Class 11, i.e. the weakest class), showing CAM (blue), Grad-CAM (orange), and the random (green) methods, for (\ref{['fig:class11_pgi']}) PGI, (\ref{['fig:class11_pgu']}) PGU, (\ref{['fig:class11_risb']}) RISb, (\ref{['fig:class11_risj']}) RISj, (\ref{['fig:class11_risv']}) RISv, (\ref{['fig:class11_ros']}) ROS, and (\ref{['fig:class11_rrs']}) RRS. The $y$-axis measures the metric values, while the $x$-axis shows the perturbation magnitude. CAM and Grad-CAM graphs overlap due to extremely similar metric outcomes.
  • Figure 5: Evaluation metric outcomes for 'Jump Up' (Class 26, i.e. the strongest class), showing CAM (blue), Grad-CAM (orange), and the random (green) methods, for (\ref{['fig:class26_pgi']}) PGI, (\ref{['fig:class26_pgu']}) PGU, (\ref{['fig:class26_risb']}) RISb, (\ref{['fig:class26_risj']}) RISj, (\ref{['fig:class26_risv']}) RISv, (\ref{['fig:class26_ros']}) ROS, and (\ref{['fig:class26_rrs']}) RRS. The $y$-axis measures the metric values, while the $x$-axis shows the perturbation magnitude. CAM and Grad-CAM graphs overlap due to extremely similar metric outcomes.