N$\text{A}^\text{2}$Q: Neural Attention Additive Model for Interpretable Multi-Agent Q-Learning
Zichuan Liu, Yuanyang Zhu, Chunlin Chen
TL;DR
This work addresses the opaque credit assignment problem in cooperative MARL by introducing NA2Q, a neural additive model-based value decomposition that renders decisions transparent. NA2Q learns unary and pairwise shape functions to decompose the joint value, augmented with identity semantics via a VAE to provide interpretable local observations and backdoor-adjusted credits through attention. Theoretical analysis yields regret bounds for the enriched decomposition, and extensive experiments on Level Based Foraging and SMAC demonstrate both strong performance and interpretable decision-making, including visual masks that reveal what agents attend to. Overall, NA2Q advances interpretable coordination in multi-agent systems while maintaining competitive performance and providing diagnostic tools for understanding agent behavior.
Abstract
Value decomposition is widely used in cooperative multi-agent reinforcement learning, however, its implicit credit assignment mechanism is not yet fully understood due to black-box networks. In this work, we study an interpretable value decomposition framework via the family of generalized additive models. We present a novel method, named Neural Attention Additive Q-learning (N$\text{A}^\text{2}$Q), providing inherent intelligibility of collaboration behavior. N$\text{A}^\text{2}$Q can explicitly factorize the optimal joint policy induced by enriching shape functions to model all possible coalitions of agents into individual policies. Moreover, we construct identity semantics to promote estimating credits together with the global state and individual value functions, where local semantic masks help us diagnose whether each agent captures relevant-task information. Extensive experiments show that N$\text{A}^\text{2}$Q consistently achieves superior performance compared to different state-of-the-art methods on all challenging tasks, while yielding human-like interpretability.
