Table of Contents
Fetching ...

Counterfactual Effect Decomposition in Multi-Agent Sequential Decision Making

Stelios Triantafyllou, Aleksa Sukovic, Yasaman Zolfimoselo, Goran Radanovic

TL;DR

This work addresses explaining counterfactual outcomes in multi-agent sequential decision making by introducing a causal explanation framework that decomposes the total counterfactual effect (TCFE) of an action into two principal pathways: propagation through subsequent agents (tot-ASE) and propagation through environment state transitions (r-SSE). It further refines these components by attributing tot-ASE to individual agents via Shapley-value-based agent-specific effects (ASE-SV) and allocating r-SSE across state variables through intrinsic causal contributions (ICC) with the r-SSE-ICC method. The central theoretical result shows TCFE = ASE - r-SSE, connecting agent-centric and state-centric mechanisms, and is complemented by an algorithmic approach for estimating conditional variances to enable counterfactual attributions. Empirical validation in Gridworld with an LLM-guided planner and a sepsis management simulator demonstrates the interpretability and practical utility of the decomposition, highlighting distinct roles for agents and environment in shaping counterfactual outcomes. These contributions offer a principled, interpretable toolkit for accountability in multi-agent decision systems and lay groundwork for extensions under partial identifiability and scalable computation.

Abstract

We address the challenge of explaining counterfactual outcomes in multi-agent Markov decision processes. In particular, we aim to explain the total counterfactual effect of an agent's action on the outcome of a realized scenario through its influence on the environment dynamics and the agents' behavior. To achieve this, we introduce a novel causal explanation formula that decomposes the counterfactual effect by attributing to each agent and state variable a score reflecting their respective contributions to the effect. First, we show that the total counterfactual effect of an agent's action can be decomposed into two components: one measuring the effect that propagates through all subsequent agents' actions and another related to the effect that propagates through the state transitions. Building on recent advancements in causal contribution analysis, we further decompose these two effects as follows. For the former, we consider agent-specific effects -- a causal concept that quantifies the counterfactual effect of an agent's action that propagates through a subset of agents. Based on this notion, we use Shapley value to attribute the effect to individual agents. For the latter, we consider the concept of structure-preserving interventions and attribute the effect to state variables based on their "intrinsic" contributions. Through extensive experimentation, we demonstrate the interpretability of our approach in a Gridworld environment with LLM-assisted agents and a sepsis management simulator.

Counterfactual Effect Decomposition in Multi-Agent Sequential Decision Making

TL;DR

This work addresses explaining counterfactual outcomes in multi-agent sequential decision making by introducing a causal explanation framework that decomposes the total counterfactual effect (TCFE) of an action into two principal pathways: propagation through subsequent agents (tot-ASE) and propagation through environment state transitions (r-SSE). It further refines these components by attributing tot-ASE to individual agents via Shapley-value-based agent-specific effects (ASE-SV) and allocating r-SSE across state variables through intrinsic causal contributions (ICC) with the r-SSE-ICC method. The central theoretical result shows TCFE = ASE - r-SSE, connecting agent-centric and state-centric mechanisms, and is complemented by an algorithmic approach for estimating conditional variances to enable counterfactual attributions. Empirical validation in Gridworld with an LLM-guided planner and a sepsis management simulator demonstrates the interpretability and practical utility of the decomposition, highlighting distinct roles for agents and environment in shaping counterfactual outcomes. These contributions offer a principled, interpretable toolkit for accountability in multi-agent decision systems and lay groundwork for extensions under partial identifiability and scalable computation.

Abstract

We address the challenge of explaining counterfactual outcomes in multi-agent Markov decision processes. In particular, we aim to explain the total counterfactual effect of an agent's action on the outcome of a realized scenario through its influence on the environment dynamics and the agents' behavior. To achieve this, we introduce a novel causal explanation formula that decomposes the counterfactual effect by attributing to each agent and state variable a score reflecting their respective contributions to the effect. First, we show that the total counterfactual effect of an agent's action can be decomposed into two components: one measuring the effect that propagates through all subsequent agents' actions and another related to the effect that propagates through the state transitions. Building on recent advancements in causal contribution analysis, we further decompose these two effects as follows. For the former, we consider agent-specific effects -- a causal concept that quantifies the counterfactual effect of an agent's action that propagates through a subset of agents. Based on this notion, we use Shapley value to attribute the effect to individual agents. For the latter, we consider the concept of structure-preserving interventions and attribute the effect to state variables based on their "intrinsic" contributions. Through extensive experimentation, we demonstrate the interpretability of our approach in a Gridworld environment with LLM-assisted agents and a sepsis management simulator.

Paper Structure

This paper contains 39 sections, 3 theorems, 17 equations, 10 figures, 3 tables, 4 algorithms.

Key Result

Theorem 3.3

The total counterfactual effect, total agent-specific effect and reverse state-specific effect obey the following relationship

Figures (10)

  • Figure 1: Fig. \ref{['fig.sepsis_traj']} depicts (part of) a simulated scenario from the two-agent Sepsis environment in Section \ref{['sec.sepsis_exp']}, where the patient's treatment fails. In the same figure, we have also included the values from a sampled counterfactual scenario (values that are different are shown in orange), where the clinician's action is fixed to override the AI's action at step $10$. Hence, the patient receives treatment A&V instead of A&E. Plot \ref{['plot.intro_dec']} shows the results of our decomposition approach for this scenario.
  • Figure 2: \ref{['fig.grid_trajectories']} depicts the actors' movements in both the factual and counterfactual trajectory used in our experiments. Initially, both $\mathcal{A}_1$ and $\mathcal{A}_2$ (represented by solid circles) are instructed to pickup the pink object and deliver it to the pink delivery location. In the counterfactual trajectory, $\mathcal{A}_2$ is forced to pickup the green object instead, prompting Planner to issue an alternative instruction for delivery to the green location. This intervention does not affect $\mathcal{A}_1$'s behavior. A textual depiction of both trajectories is provided in Appendix \ref{['app.additional_results_grid']}. Plot \ref{['plot.grid_effects']} shows the values of various counterfactual effects computed on the trajectory's discounted total reward. The minus sign indicates that the negative of these values are plotted. Plot \ref{['plot.grid_icc']} shows the contribution ratios attributed to all state variables by r-SSE-ICC. Averages and standard errors are reported for $5$ seeds.
  • Figure 3: Plots \ref{['plot.sepsis_decomp_ai']} and\ref{['plot.sepsis_decomp_cl']} show the average percentage decomposition of -r-SSE and scores $\phi_{\text{cl}}$ and $\phi_{\text{ai}}$ attributed by ASE-SV w.r.t. TCFE, for interventions on the actions of AI and clinician, respectively, while varying trust parameter $\mu$. Plot \ref{['plot.sepsis_gini']} shows the Gini coefficient distribution over the scores attributed to state variables by the r-SSE-ICC method. The x-axis displays how many rounds after the considered intervention the trajectory terminates.
  • Figure 4: The causal graph of an MMDP-SCM with $n$ agents and horizon $h$. Exogenous variables are omitted.
  • Figure 5: Gridworld: Plot \ref{['plot.grid_planner_intervention']} shows the values of various counterfactual effects computed on the trajectory’s total collected reward for the case when we intervene on the Planner's action at Step 2, forcing it to instruct $\mathcal{A}_2$ to pick up the green object instead of the pink one. Averages and standard errors are reported for $5$ different seeds.
  • ...and 5 more figures

Theorems & Definitions (12)

  • Definition 2.1: TCFE
  • Definition 3.1: tot-ASE
  • Definition 3.2: SSE
  • Theorem 3.3
  • Definition 4.1: r-SSE-ICC
  • Theorem 4.2
  • Definition 5.1: ASE
  • Definition 5.2: ASE-SV
  • Theorem 5.3
  • Definition 5.1: Noise Monotonicity
  • ...and 2 more