On Generating Explanations for Reinforcement Learning Policies: An Empirical Study

Mikihisa Yuasa; Huy T. Tran; Ramavarapu S. Sreenivas

On Generating Explanations for Reinforcement Learning Policies: An Empirical Study

Mikihisa Yuasa, Huy T. Tran, Ramavarapu S. Sreenivas

TL;DR

This work tackles the explainability gap of reinforcement learning policies by introducing a class of linear temporal logic explanations and a greedy local-search to identify the best explanation for a given target policy. Explanations are expressed as $φ = F(φ_F) ∧ G(φ_G)$ and are evaluated by translating each candidate into an FSPA-augmented MDP and optimizing a policy under the candidate's reward structure, then comparing it to the target policy via a weighted KL divergence $U^{φ}$. The approach is validated in three simulated domains (CtF, car parking, and robot navigation) with PPO and SAC+HER, demonstrating the ability to recover target explanations and to propose plausible alternatives, while ablation tests reveal the importance of expansion/extension and weighting in avoiding local optima. The work discusses limitations such as computational complexity and dependency on predefined predicates, and outlines future directions including predicate automation, natural-language rendering, and scaling through neural LTL representations to enhance practical applicability in safety-critical systems.

Abstract

Understanding a \textit{reinforcement learning} policy, which guides state-to-action mappings to maximize rewards, necessitates an accompanying explanation for human comprehension. In this paper, we introduce a set of \textit{linear temporal logic} formulae designed to provide explanations for policies, and an algorithm for searching through those formulae for the one that best explains a given policy. Our focus is on explanations that elucidate both the ultimate objectives accomplished by the policy and the prerequisite conditions it upholds throughout its execution. The effectiveness of our proposed approach is illustrated through a simulated game of capture-the-flag and a car-parking environment,

On Generating Explanations for Reinforcement Learning Policies: An Empirical Study

TL;DR

and are evaluated by translating each candidate into an FSPA-augmented MDP and optimizing a policy under the candidate's reward structure, then comparing it to the target policy via a weighted KL divergence

. The approach is validated in three simulated domains (CtF, car parking, and robot navigation) with PPO and SAC+HER, demonstrating the ability to recover target explanations and to propose plausible alternatives, while ablation tests reveal the importance of expansion/extension and weighting in avoiding local optima. The work discusses limitations such as computational complexity and dependency on predefined predicates, and outlines future directions including predicate automation, natural-language rendering, and scaling through neural LTL representations to enhance practical applicability in safety-critical systems.

Abstract

Paper Structure (13 sections, 9 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 13 sections, 9 equations, 3 figures, 2 tables, 2 algorithms.

Introduction
Background
Reinforcement Learning
Linear Temporal Logic and FSPA-Augmented MDPs
Method
Definition of Explanations
Evaluation of Explanations
Neighborhood Definition and Evaluation
Additional Neighborhood Expansion & Extension
Results
Test Environments and RL Training Details
Experiment Results
Conclusions

Figures (3)

Figure 1: Overview of our proposed search algorithm.
Figure 2: Screenshots from our environments.
Figure 3: A partial trace of CtF Search 5 with wKL divergence values.

On Generating Explanations for Reinforcement Learning Policies: An Empirical Study

TL;DR

Abstract

On Generating Explanations for Reinforcement Learning Policies: An Empirical Study

Authors

TL;DR

Abstract

Table of Contents

Figures (3)