BET: Explaining Deep Reinforcement Learning through The Error-Prone Decisions
Xiao Liu, Jie Zhao, Wubing Chen, Mao Tan, Yongxing Su
TL;DR
Deep Reinforcement Learning models yield strong performance but suffer from opaque decision making. This work introduces Backbone Extract Tree BET, a self interpretable model that uses Bones as class specific cluster centroids and a tree-like Backbone to identify and explain error prone states in a transparent hierarchical framework. BET optimizes Bones to minimize intra cluster distances and performs posterior inference via Gaussian kernel similarities, yielding faithful, interpretable explanations across tasks including StarCraft II. The approach provides insights into agent behavior, error prone regions, and perturbation-based decision changes, with potential for few-shot explanations and expanded semantic interpretation of state spaces in safety sensitive domains.
Abstract
Despite the impressive capabilities of Deep Reinforcement Learning (DRL) agents in many challenging scenarios, their black-box decision-making process significantly limits their deployment in safety-sensitive domains. Several previous self-interpretable works focus on revealing the critical states of the agent's decision. However, they cannot pinpoint the error-prone states. To address this issue, we propose a novel self-interpretable structure, named Backbone Extract Tree (BET), to better explain the agent's behavior by identify the error-prone states. At a high level, BET hypothesizes that states in which the agent consistently executes uniform decisions exhibit a reduced propensity for errors. To effectively model this phenomenon, BET expresses these states within neighborhoods, each defined by a curated set of representative states. Therefore, states positioned at a greater distance from these representative benchmarks are more prone to error. We evaluate BET in various popular RL environments and show its superiority over existing self-interpretable models in terms of explanation fidelity. Furthermore, we demonstrate a use case for providing explanations for the agents in StarCraft II, a sophisticated multi-agent cooperative game. To the best of our knowledge, we are the first to explain such a complex scenarios using a fully transparent structure.
