Evaluating and Improving Graph-based Explanation Methods for Multi-Agent Coordination
Siva Kailas, Shalin Jain, Harish Ravichandar
TL;DR
This work evaluates existing post-hoc graph explainers for GNN-based multi-agent coordination and introduces a theoretically motivated attention entropy regularizer for graph attention policies. The proposed regularizer encourages focused inter-agent attention, which, in turn, enhances the quality of explanations as measured by fidelity and faithfulness metrics across multiple tasks and team sizes, while maintaining task performance. The authors provide both theoretical analysis and extensive empirical results showing that Attention Explainer benefits most from entropy minimization, with GraphMask and GNN-Explainer displaying mixed or supplemental improvements. Overall, the paper advances interpretable graph-based policies in MARL and lays groundwork for further integration of explainability into multi-agent learning pipelines.
Abstract
Graph Neural Networks (GNNs), developed by the graph learning community, have been adopted and shown to be highly effective in multi-robot and multi-agent learning. Inspired by this successful cross-pollination, we investigate and characterize the suitability of existing GNN explanation methods for explaining multi-agent coordination. We find that these methods have the potential to identify the most-influential communication channels that impact the team's behavior. Informed by our initial analyses, we propose an attention entropy regularization term that renders GAT-based policies more amenable to existing graph-based explainers. Intuitively, minimizing attention entropy incentivizes agents to limit their attention to the most influential or impactful agents, thereby easing the challenge faced by the explainer. We theoretically ground this intuition by showing that minimizing attention entropy increases the disparity between the explainer-generated subgraph and its complement. Evaluations across three tasks and three team sizes i) provides insights into the effectiveness of existing explainers, and ii) demonstrates that our proposed regularization consistently improves explanation quality without sacrificing task performance.
