Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour
Bálint Gyevnár, Christopher G. Lucas, Stefano V. Albrecht, Shay B. Cohen
TL;DR
This work addresses explainability in autonomous multi-agent systems by introducing AXIS, a framework that generates human-centered action explanations through counterfactual interrogations. AXIS combines a counterfactual effect size model (CESM) with an LLM to propose interventions, verballise context, and forward-simulate trajectories, producing explanations across multi-round interrogations. The authors formalize action explanations in partially observable stochastic games (POSGs), design a modular AXIS algorithm with options, and evaluate it on autonomous driving motion planning across ten scenarios using five LLMs, demonstrating improved perceived correctness and goal prediction while maintaining actionability. The study provides a rigorous evaluation methodology, reveals insights from Shapley analysis of context features, and makes the code openly available, contributing to robust, human-centered explainability in MAS with practical implications for trust and safety in automated driving.
Abstract
Autonomous multi-agent systems (MAS) are useful for automating complex tasks but raise trust concerns due to risks such as miscoordination or goal misalignment. Explainability is vital for users' trust calibration, but explainable MAS face challenges due to complex environments, the human factor, and non-standardised evaluation. Leveraging the counterfactual effect size model and LLMs, we propose Agentic eXplanations via Interrogative Simulation (AXIS). AXIS generates human-centred action explanations for multi-agent policies by having an LLM interrogate an environment simulator using prompts like 'whatif' and 'remove' to observe and synthesise counterfactual information over multiple rounds. We evaluate AXIS on autonomous driving across ten scenarios for five LLMs with a comprehensive methodology combining robustness, subjective preference, correctness, and goal/action prediction with an external LLM as evaluator. Compared to baselines, AXIS improves perceived explanation correctness by at least 7.7% across all models and goal prediction accuracy by 23% for four models, with comparable action prediction accuracy, achieving the highest scores overall. Our code is open-sourced at https://github.com/gyevnarb/axis.
