Explaining Reinforcement Learning: A Counterfactual Shapley Values Approach
Yiwei Shi, Qi Zhang, Kevin McAreavey, Weiru Liu
TL;DR
This work tackles explainability in reinforcement learning by introducing Counterfactual Shapley Values, a framework that blends counterfactual reasoning with Shapley value attributions to quantify how each state dimension contributes to action choices. It defines two counterfactual characteristic value functions, the Counterfactual Difference CVF and the Average Counterfactual Difference CVF, and uses them with both action value and state value functions to produce detailed attributions. The authors validate CSV across GridWorld variants, FrozenLake, Taxi, Minesweeper, and Pendulum, demonstrating that it yields interpretable, quantitative insights into why actions are chosen and how feature contributions differ across decisions. By providing both fine grained action level explanations (CD) and broad policy level assessments (ACD), CSV supports more transparent and trustworthy RL deployments with practical implications for debugging and trust in autonomous systems.
Abstract
This paper introduces a novel approach Counterfactual Shapley Values (CSV), which enhances explainability in reinforcement learning (RL) by integrating counterfactual analysis with Shapley Values. The approach aims to quantify and compare the contributions of different state dimensions to various action choices. To more accurately analyze these impacts, we introduce new characteristic value functions, the ``Counterfactual Difference Characteristic Value" and the ``Average Counterfactual Difference Characteristic Value." These functions help calculate the Shapley values to evaluate the differences in contributions between optimal and non-optimal actions. Experiments across several RL domains, such as GridWorld, FrozenLake, and Taxi, demonstrate the effectiveness of the CSV method. The results show that this method not only improves transparency in complex RL systems but also quantifies the differences across various decisions.
