Table of Contents
Fetching ...

Explaining Reinforcement Learning: A Counterfactual Shapley Values Approach

Yiwei Shi, Qi Zhang, Kevin McAreavey, Weiru Liu

TL;DR

This work tackles explainability in reinforcement learning by introducing Counterfactual Shapley Values, a framework that blends counterfactual reasoning with Shapley value attributions to quantify how each state dimension contributes to action choices. It defines two counterfactual characteristic value functions, the Counterfactual Difference CVF and the Average Counterfactual Difference CVF, and uses them with both action value and state value functions to produce detailed attributions. The authors validate CSV across GridWorld variants, FrozenLake, Taxi, Minesweeper, and Pendulum, demonstrating that it yields interpretable, quantitative insights into why actions are chosen and how feature contributions differ across decisions. By providing both fine grained action level explanations (CD) and broad policy level assessments (ACD), CSV supports more transparent and trustworthy RL deployments with practical implications for debugging and trust in autonomous systems.

Abstract

This paper introduces a novel approach Counterfactual Shapley Values (CSV), which enhances explainability in reinforcement learning (RL) by integrating counterfactual analysis with Shapley Values. The approach aims to quantify and compare the contributions of different state dimensions to various action choices. To more accurately analyze these impacts, we introduce new characteristic value functions, the ``Counterfactual Difference Characteristic Value" and the ``Average Counterfactual Difference Characteristic Value." These functions help calculate the Shapley values to evaluate the differences in contributions between optimal and non-optimal actions. Experiments across several RL domains, such as GridWorld, FrozenLake, and Taxi, demonstrate the effectiveness of the CSV method. The results show that this method not only improves transparency in complex RL systems but also quantifies the differences across various decisions.

Explaining Reinforcement Learning: A Counterfactual Shapley Values Approach

TL;DR

This work tackles explainability in reinforcement learning by introducing Counterfactual Shapley Values, a framework that blends counterfactual reasoning with Shapley value attributions to quantify how each state dimension contributes to action choices. It defines two counterfactual characteristic value functions, the Counterfactual Difference CVF and the Average Counterfactual Difference CVF, and uses them with both action value and state value functions to produce detailed attributions. The authors validate CSV across GridWorld variants, FrozenLake, Taxi, Minesweeper, and Pendulum, demonstrating that it yields interpretable, quantitative insights into why actions are chosen and how feature contributions differ across decisions. By providing both fine grained action level explanations (CD) and broad policy level assessments (ACD), CSV supports more transparent and trustworthy RL deployments with practical implications for debugging and trust in autonomous systems.

Abstract

This paper introduces a novel approach Counterfactual Shapley Values (CSV), which enhances explainability in reinforcement learning (RL) by integrating counterfactual analysis with Shapley Values. The approach aims to quantify and compare the contributions of different state dimensions to various action choices. To more accurately analyze these impacts, we introduce new characteristic value functions, the ``Counterfactual Difference Characteristic Value" and the ``Average Counterfactual Difference Characteristic Value." These functions help calculate the Shapley values to evaluate the differences in contributions between optimal and non-optimal actions. Experiments across several RL domains, such as GridWorld, FrozenLake, and Taxi, demonstrate the effectiveness of the CSV method. The results show that this method not only improves transparency in complex RL systems but also quantifies the differences across various decisions.
Paper Structure (16 sections, 9 equations, 7 figures, 3 tables)

This paper contains 16 sections, 9 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Computation of Counterfactual Shapley Value using VANI-CVF, CD-CVF, and ACD-CVF
  • Figure 2: Comparison of GridWorld-1 and GridWorld-2, including their corresponding values of $V(s)$, $Q(s,a)$, and $U(s)$ based on $\pi^*$.
  • Figure 3: State 2 and State 8 are respectively depicted on the upper left and upper right of the figure, representing the coordinates of the agent's current position. State 2 corresponds to [2,0], and State 8 corresponds to [2,2]. Counterfactual Shapley Values for Each Dimension of States on FrozenLake are displayed in the lower half of the figure for these two states.
  • Figure 4: State 1 and State 2 represent two different scenarios in the environment. In State 1, the taxi's current position is at [0,4], indicated by the yellow dot, and the passenger is also at position Green, which is [0,4], shown by the blue dot. At this moment, the passenger is inside the taxi, and their destination is also at position Green. Essentially, this state signifies that the passenger has reached the destination but remains in the taxi, and the optimal action for this state is to drop off. Similarly, State 2 is represented as [0,0,0,1], indicating that the taxi is at position [0,0], corresponding to the yellow dot. In this scenario, the passenger, located at position Red, is not inside the taxi, and their destination is at position Green, indicated by the green dot, which is different from the previous positions. In this case, the optimal action for the state is to pick up the passenger.
  • Figure 5: The first column shows the actual arrangement of each position in the current game, which is unknown to the player. $M_1$ represents the location of the first mine, $M_2$ represents the location of the second mine, and other numbers indicate the number of mines adjacent to the current position. The second column shows a specific state of the game at the current progress, with blue characters indicating known or observable positions, and red question marks indicating unknown distributions, i.e., positions where mines may be present. The third column lists potential actions, such as $M_2$? indicating the player's assumption of the location of the second mine, i.e., opening another question mark position.
  • ...and 2 more figures