Table of Contents
Fetching ...

Semifactual Explanations for Reinforcement Learning

Jasmina Gajcin, Jovan Jeromela, Ivana Dusparic

TL;DR

This work tackles the explainability gap in deep reinforcement learning by introducing semifactual explanations, which describe how an outcome would remain the same under plausible past or future changes. It defines five RL-specific semifactual properties and proposes two model-agnostic NSGA-II-based algorithms, SGRL-Advance and SGRL-Rewind, to optimize them. Empirical evaluation in Stochastic Gridworld and Frozen Lake shows that semifactuals produced by SGRL methods are more faithful to the policy and more diverse than a supervised-learning baseline (S-GEN), with a user study indicating potential improvements in user understanding. Overall, the paper lays foundational work for RL-specific semifactual explanations and points to future work on more complex domains and offline settings.

Abstract

Reinforcement Learning (RL) is a learning paradigm in which the agent learns from its environment through trial and error. Deep reinforcement learning (DRL) algorithms represent the agent's policies using neural networks, making their decisions difficult to interpret. Explaining the behaviour of DRL agents is necessary to advance user trust, increase engagement, and facilitate integration with real-life tasks. Semifactual explanations aim to explain an outcome by providing "even if" scenarios, such as "even if the car were moving twice as slowly, it would still have to swerve to avoid crashing". Semifactuals help users understand the effects of different factors on the outcome and support the optimisation of resources. While extensively studied in psychology and even utilised in supervised learning, semifactuals have not been used to explain the decisions of RL systems. In this work, we develop a first approach to generating semifactual explanations for RL agents. We start by defining five properties of desirable semifactual explanations in RL and then introducing SGRL-Rewind and SGRL-Advance, the first algorithms for generating semifactual explanations in RL. We evaluate the algorithms in two standard RL environments and find that they generate semifactuals that are easier to reach, represent the agent's policy better, and are more diverse compared to baselines. Lastly, we conduct and analyse a user study to assess the participant's perception of semifactual explanations of the agent's actions.

Semifactual Explanations for Reinforcement Learning

TL;DR

This work tackles the explainability gap in deep reinforcement learning by introducing semifactual explanations, which describe how an outcome would remain the same under plausible past or future changes. It defines five RL-specific semifactual properties and proposes two model-agnostic NSGA-II-based algorithms, SGRL-Advance and SGRL-Rewind, to optimize them. Empirical evaluation in Stochastic Gridworld and Frozen Lake shows that semifactuals produced by SGRL methods are more faithful to the policy and more diverse than a supervised-learning baseline (S-GEN), with a user study indicating potential improvements in user understanding. Overall, the paper lays foundational work for RL-specific semifactual explanations and points to future work on more complex domains and offline settings.

Abstract

Reinforcement Learning (RL) is a learning paradigm in which the agent learns from its environment through trial and error. Deep reinforcement learning (DRL) algorithms represent the agent's policies using neural networks, making their decisions difficult to interpret. Explaining the behaviour of DRL agents is necessary to advance user trust, increase engagement, and facilitate integration with real-life tasks. Semifactual explanations aim to explain an outcome by providing "even if" scenarios, such as "even if the car were moving twice as slowly, it would still have to swerve to avoid crashing". Semifactuals help users understand the effects of different factors on the outcome and support the optimisation of resources. While extensively studied in psychology and even utilised in supervised learning, semifactuals have not been used to explain the decisions of RL systems. In this work, we develop a first approach to generating semifactual explanations for RL agents. We start by defining five properties of desirable semifactual explanations in RL and then introducing SGRL-Rewind and SGRL-Advance, the first algorithms for generating semifactual explanations in RL. We evaluate the algorithms in two standard RL environments and find that they generate semifactuals that are easier to reach, represent the agent's policy better, and are more diverse compared to baselines. Lastly, we conduct and analyse a user study to assess the participant's perception of semifactual explanations of the agent's actions.
Paper Structure (27 sections, 12 equations, 2 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 12 equations, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: Forward and backward semifactuals: In an agricultural task, semifactuals can be used to explain why the yielded fruit weight at a specific time was not higher. The backward semifactual looks into the past and reports that more rain would not have changed the fruit weight. Conversely, the forward semifactual looks into the future and states that more watering in the future is not going to change the outcome. Dashed lines represent the effect of stochastic processes on the environment's state. In this case, a backward semifactual relies on a non-actionable change of weather, while the forward one explains the outcome through an actionable change.
  • Figure 2: User satisfaction scores for semifactuals generated by S-GEN1, SGRL-Advance and SGRL-Rewind based on explanations goodness metrics hoffman2018metrics.

Theorems & Definitions (2)

  • definition 1: Outcome
  • definition 2