XRL-Bench: A Benchmark for Evaluating and Comparing Explainable Reinforcement Learning Techniques
Yu Xiong, Zhipeng Hu, Ye Huang, Runze Wu, Kai Guan, Xingchen Fang, Ji Jiang, Tianze Zhou, Yujing Hu, Haoyu Liu, Tangjie Lyu, Changjie Fan
TL;DR
This paper presents XRL-Bench, a unified benchmark for evaluating state-explanation methods in Explainable Reinforcement Learning, addressing the lack of standardized evaluation in XRL. It defines a three-module framework (environments, state-explanation explainers, and evaluators) and supports both tabular and image state inputs, along with a novel TabularSHAP method. The authors propose objective fidelity and stability metrics and provide extensive benchmarking across multiple RL environments, revealing that TabularSHAP achieves superior fidelity on tabular data while SHAP-based gradient methods excel for image data. A real-world online gaming case study demonstrates practical utility for debugging and improving RL-driven AI bots, and the work is complemented by an open-source benchmark platform to promote reproducibility and extension. The contribution advances XRL research by delivering a repeatable, scalable evaluation framework and a competitive new explainability method with demonstrated industrial relevance.
Abstract
Reinforcement Learning (RL) has demonstrated substantial potential across diverse fields, yet understanding its decision-making process, especially in real-world scenarios where rationality and safety are paramount, is an ongoing challenge. This paper delves in to Explainable RL (XRL), a subfield of Explainable AI (XAI) aimed at unravelling the complexities of RL models. Our focus rests on state-explaining techniques, a crucial subset within XRL methods, as they reveal the underlying factors influencing an agent's actions at any given time. Despite their significant role, the lack of a unified evaluation framework hinders assessment of their accuracy and effectiveness. To address this, we introduce XRL-Bench, a unified standardized benchmark tailored for the evaluation and comparison of XRL methods, encompassing three main modules: standard RL environments, explainers based on state importance, and standard evaluators. XRL-Bench supports both tabular and image data for state explanation. We also propose TabularSHAP, an innovative and competitive XRL method. We demonstrate the practical utility of TabularSHAP in real-world online gaming services and offer an open-source benchmark platform for the straightforward implementation and evaluation of XRL methods. Our contributions facilitate the continued progression of XRL technology.
