SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies

Amir Samadi; Konstantinos Koufos; Kurt Debattista; Mehrdad Dianati

SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies

Amir Samadi, Konstantinos Koufos, Kurt Debattista, Mehrdad Dianati

TL;DR

This work tackles the explainability gap in DRL for safety-critical tasks by introducing SAFE-RL, a saliency-guided counterfactual explainer that generates temporally coherent CFs focused on salient input regions. The approach combines Eigen-CAM saliency maps with an AttentionGAN-based generator to produce plausible CF states that drive a targeted alternative action, evaluated across autonomous driving scenarios and classic Atari games with three grey-box DRL agents. SAFE-RL achieves notable improvements in CF quality metrics such as proximity, sparsity, and validity over state-of-the-art baselines, while maintaining realistic CF visuals, and is supported by extensive datasets and public code releases. The work advances XAI for DRL by enabling more informative, feasible CF explanations that can aid deployment in real-world systems like ADS, and sets a benchmark for future research in temporally-aware CF generation.

Abstract

While Deep Reinforcement Learning (DRL) has emerged as a promising solution for intricate control tasks, the lack of explainability of the learned policies impedes its uptake in safety-critical applications, such as automated driving systems (ADS). Counterfactual (CF) explanations have recently gained prominence for their ability to interpret black-box Deep Learning (DL) models. CF examples are associated with minimal changes in the input, resulting in a complementary output by the DL model. Finding such alternations, particularly for high-dimensional visual inputs, poses significant challenges. Besides, the temporal dependency introduced by the reliance of the DRL agent action on a history of past state observations further complicates the generation of CF examples. To address these challenges, we propose using a saliency map to identify the most influential input pixels across the sequence of past observed states by the agent. Then, we feed this map to a deep generative model, enabling the generation of plausible CFs with constrained modifications centred on the salient regions. We evaluate the effectiveness of our framework in diverse domains, including ADS, Atari Pong, Pacman and space-invaders games, using traditional performance metrics such as validity, proximity and sparsity. Experimental results demonstrate that this framework generates more informative and plausible CFs than the state-of-the-art for a wide range of environments and DRL agents. In order to foster research in this area, we have made our datasets and codes publicly available at https://github.com/Amir-Samadi/SAFE-RL.

SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies

TL;DR

Abstract

Paper Structure (15 sections, 8 equations, 8 figures, 2 tables)

This paper contains 15 sections, 8 equations, 8 figures, 2 tables.

Introduction
State-of-the-art
Contributions
Saliency-Aware CF DRL Explainer
Problem Definition
Overview of the Proposed Framework
Dataset
SAFE-RL
Evaluations
Implementation Details
Performance Metrics
Highway, Roundabout and Atari Games Environments
Performance Evaluation
Quality of Counterfactual Explanations
Conclusions

Figures (8)

Figure 1: Overview of the CF generation method for DRL agents. In this illustration, the input and output to the CF explainer consist of four images but only the most recent one is depicted for clarity. While initially, the DRL agent (lock symbolises frozen parameters) decides to accelerate the EGO vehicle (green rectangle) depicted on the left-hand side, the selected action by a user is to slow down the EGO vehicle without performing a lane change. Therefore the SAFE-RL generated a CF state where the EGO vehicle is surrounded by participant vehicles (blue rectangles), depicted on the right-hand side.
Figure 2: Generation process of the dataset including the states, actions and saliency maps for a grey-box DRL agent.
Figure 3: (a) Block diagram of the SAFE-RL model. (b) Generator network model details.
Figure 4: Visual evaluation of the comparative performance between the SOTA approach and SAFE-RL across three distinct DRL agents: DQN, PPO, and A2C. The initial action is represented as $a$, while the desired action is indicated as $a'$. The EV is portrayed as a green box, situated alongside PVs (blue boxes).
Figure 5: A visual comparison of the Huber and SAFE-RL models across three different DRL agents: DQN, PPO, and A2C. The original action is shown by $a$, while the desired actions are marked as $a'$. The EV is depicted as a green box, positioned alongside PVs (blue boxes).
...and 3 more figures

SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies

TL;DR

Abstract

SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies

Authors

TL;DR

Abstract

Table of Contents

Figures (8)