Approximating Shapley Explanations in Reinforcement Learning
Daniel Beechey, Özgür Şimşek
TL;DR
FastSVERL provides a scalable, model-based framework to approximate Shapley explanations in reinforcement learning by learning a Shapley predictor that estimates per-feature contributions across states and actions. It replaces exact, combinatorial Shapley computations with amortised, differentiable LS losses over sampled subsets and states, and enforces the efficiency constraint to recover true Shapley values in the limit. The approach handles temporal dependencies, off-policy data through importance sampling, and continual learning by updating explanations in tandem with policy updates, demonstrating convergence and scalability in domains such as Mastermind and Gridworld. A key extension replaces costly characteristic models with single-sample approximations, further boosting efficiency while retaining unbiased explanations. Overall, FastSVERL delivers principled, real-time interpretability for RL with practical applicability to broader learning settings.
Abstract
Reinforcement learning has achieved remarkable success in complex decision-making environments, yet its lack of transparency limits its deployment in practice, especially in safety-critical settings. Shapley values from cooperative game theory provide a principled framework for explaining reinforcement learning; however, the computational cost of Shapley explanations is an obstacle to their use. We introduce FastSVERL, a scalable method for explaining reinforcement learning by approximating Shapley values. FastSVERL is designed to handle the unique challenges of reinforcement learning, including temporal dependencies across multi-step trajectories, learning from off-policy data, and adapting to evolving agent behaviours in real time. FastSVERL introduces a practical, scalable approach for principled and rigorous interpretability in reinforcement learning.
