Table of Contents
Fetching ...

Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour

Bálint Gyevnár, Christopher G. Lucas, Stefano V. Albrecht, Shay B. Cohen

TL;DR

This work addresses explainability in autonomous multi-agent systems by introducing AXIS, a framework that generates human-centered action explanations through counterfactual interrogations. AXIS combines a counterfactual effect size model (CESM) with an LLM to propose interventions, verballise context, and forward-simulate trajectories, producing explanations across multi-round interrogations. The authors formalize action explanations in partially observable stochastic games (POSGs), design a modular AXIS algorithm with options, and evaluate it on autonomous driving motion planning across ten scenarios using five LLMs, demonstrating improved perceived correctness and goal prediction while maintaining actionability. The study provides a rigorous evaluation methodology, reveals insights from Shapley analysis of context features, and makes the code openly available, contributing to robust, human-centered explainability in MAS with practical implications for trust and safety in automated driving.

Abstract

Autonomous multi-agent systems (MAS) are useful for automating complex tasks but raise trust concerns due to risks such as miscoordination or goal misalignment. Explainability is vital for users' trust calibration, but explainable MAS face challenges due to complex environments, the human factor, and non-standardised evaluation. Leveraging the counterfactual effect size model and LLMs, we propose Agentic eXplanations via Interrogative Simulation (AXIS). AXIS generates human-centred action explanations for multi-agent policies by having an LLM interrogate an environment simulator using prompts like 'whatif' and 'remove' to observe and synthesise counterfactual information over multiple rounds. We evaluate AXIS on autonomous driving across ten scenarios for five LLMs with a comprehensive methodology combining robustness, subjective preference, correctness, and goal/action prediction with an external LLM as evaluator. Compared to baselines, AXIS improves perceived explanation correctness by at least 7.7% across all models and goal prediction accuracy by 23% for four models, with comparable action prediction accuracy, achieving the highest scores overall. Our code is open-sourced at https://github.com/gyevnarb/axis.

Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour

TL;DR

This work addresses explainability in autonomous multi-agent systems by introducing AXIS, a framework that generates human-centered action explanations through counterfactual interrogations. AXIS combines a counterfactual effect size model (CESM) with an LLM to propose interventions, verballise context, and forward-simulate trajectories, producing explanations across multi-round interrogations. The authors formalize action explanations in partially observable stochastic games (POSGs), design a modular AXIS algorithm with options, and evaluate it on autonomous driving motion planning across ten scenarios using five LLMs, demonstrating improved perceived correctness and goal prediction while maintaining actionability. The study provides a rigorous evaluation methodology, reveals insights from Shapley analysis of context features, and makes the code openly available, contributing to robust, human-centered explainability in MAS with practical implications for trust and safety in automated driving.

Abstract

Autonomous multi-agent systems (MAS) are useful for automating complex tasks but raise trust concerns due to risks such as miscoordination or goal misalignment. Explainability is vital for users' trust calibration, but explainable MAS face challenges due to complex environments, the human factor, and non-standardised evaluation. Leveraging the counterfactual effect size model and LLMs, we propose Agentic eXplanations via Interrogative Simulation (AXIS). AXIS generates human-centred action explanations for multi-agent policies by having an LLM interrogate an environment simulator using prompts like 'whatif' and 'remove' to observe and synthesise counterfactual information over multiple rounds. We evaluate AXIS on autonomous driving across ten scenarios for five LLMs with a comprehensive methodology combining robustness, subjective preference, correctness, and goal/action prediction with an external LLM as evaluator. Compared to baselines, AXIS improves perceived explanation correctness by at least 7.7% across all models and goal prediction accuracy by 23% for four models, with comparable action prediction accuracy, achieving the highest scores overall. Our code is open-sourced at https://github.com/gyevnarb/axis.

Paper Structure

This paper contains 20 sections, 2 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: When a user asks a question about an agent's actions, AXIS retrieves the current context, which the LLM uses to interrogate a simulator of the environment, eliciting counterfactual information about the actions of agents. The LLM synthesises these counterfactuals into explanations, repeating the process over multiple rounds. At the end, AXIS prompts the LLM to synthesise all counterfactuals into a final explanation, which is returned to the user as the answer to their question. Arrow legend:blue pass symbolic data with conversion to text, green pass explanations, red pass interrogation prompts.
  • Figure 2: Example scenarios with the queried vehicle shown in blue, other vehicles in orange/white, and occluded vehicles in red. Scenario #3 (left; rational): blue sees orange come to a stop, indicating orange's intent to turn left, so blue decides to turn right instead of waiting longer to yield. Scenario #8 (right; occlusion): blue sees orange on a priority lane coming to a stop, inferring that red must be behind the building on the left, so blue stops.
  • Figure 3: Shapley values calculated from perceived correctness for GPT-4.1 (top) and Llama 3.3-70B (bottom) models. The $x$-axis shows the Shapley value contribution to the final correctness score. The blue Total bar shows the cumulative Shapley value with all features. Results are aggregated across scenarios #3, #7, #8. Error bars show standard error of mean.
  • Figure 4: The proportion of queries across all scenarios and prompts used by the different models. Most models focus on remove and whatif queries which are most suited to extract counterfactuals.
  • Figure 5: Evolution of preference/correctness (top), and goal/action prediction accuracy (bottom) of intermediate explanations over interrogation-synthesis rounds, aggregated across models, scenarios, and user prompts. The number of explanations for each round is shown above the $x$-axis in boxes. Horizontal dotted lines show the ModelOnly baseline.