Table of Contents
Fetching ...

ACTER: Diverse and Actionable Counterfactual Sequences for Explaining and Diagnosing RL Policies

Jasmina Gajcin, Ivana Dusparic

TL;DR

This work tackles the lack of actionable counterfactual explanations in RL by introducing ACTER, which generates sequences of past actions that would have avoided a failure, using a multi-objective NSGA-II optimization over five properties. ACTER produces a diverse set of actionable counterfactual sequences that remain valid under stochastic environmental configurations, with three novel diversity metrics to benchmark explanations. The approach is demonstrated in Highway driving and Farm RL tasks (covering discrete and continuous actions) and is accompanied by a user study showing ACTER explanations are perceived as more useful, detailed, and actionable, albeit with nuanced effects on diagnostic performance for non-experts. Together, ACTER advances debugging, personalization, and trust in RL by offering practical, diverse, and robust recourse options for preventing negative outcomes.

Abstract

Understanding how failure occurs and how it can be prevented in reinforcement learning (RL) is necessary to enable debugging, maintain user trust, and develop personalized policies. Counterfactual reasoning has often been used to assign blame and understand failure by searching for the closest possible world in which the failure is avoided. However, current counterfactual state explanations in RL can only explain an outcome using just the current state features and offer no actionable recourse on how a negative outcome could have been prevented. In this work, we propose ACTER (Actionable Counterfactual Sequences for Explaining Reinforcement Learning Outcomes), an algorithm for generating counterfactual sequences that provides actionable advice on how failure can be avoided. ACTER investigates actions leading to a failure and uses the evolutionary algorithm NSGA-II to generate counterfactual sequences of actions that prevent it with minimal changes and high certainty even in stochastic environments. Additionally, ACTER generates a set of multiple diverse counterfactual sequences that enable users to correct failure in the way that best fits their preferences. We also introduce three diversity metrics that can be used for evaluating the diversity of counterfactual sequences. We evaluate ACTER in two RL environments, with both discrete and continuous actions, and show that it can generate actionable and diverse counterfactual sequences. We conduct a user study to explore how explanations generated by ACTER help users identify and correct failure.

ACTER: Diverse and Actionable Counterfactual Sequences for Explaining and Diagnosing RL Policies

TL;DR

This work tackles the lack of actionable counterfactual explanations in RL by introducing ACTER, which generates sequences of past actions that would have avoided a failure, using a multi-objective NSGA-II optimization over five properties. ACTER produces a diverse set of actionable counterfactual sequences that remain valid under stochastic environmental configurations, with three novel diversity metrics to benchmark explanations. The approach is demonstrated in Highway driving and Farm RL tasks (covering discrete and continuous actions) and is accompanied by a user study showing ACTER explanations are perceived as more useful, detailed, and actionable, albeit with nuanced effects on diagnostic performance for non-experts. Together, ACTER advances debugging, personalization, and trust in RL by offering practical, diverse, and robust recourse options for preventing negative outcomes.

Abstract

Understanding how failure occurs and how it can be prevented in reinforcement learning (RL) is necessary to enable debugging, maintain user trust, and develop personalized policies. Counterfactual reasoning has often been used to assign blame and understand failure by searching for the closest possible world in which the failure is avoided. However, current counterfactual state explanations in RL can only explain an outcome using just the current state features and offer no actionable recourse on how a negative outcome could have been prevented. In this work, we propose ACTER (Actionable Counterfactual Sequences for Explaining Reinforcement Learning Outcomes), an algorithm for generating counterfactual sequences that provides actionable advice on how failure can be avoided. ACTER investigates actions leading to a failure and uses the evolutionary algorithm NSGA-II to generate counterfactual sequences of actions that prevent it with minimal changes and high certainty even in stochastic environments. Additionally, ACTER generates a set of multiple diverse counterfactual sequences that enable users to correct failure in the way that best fits their preferences. We also introduce three diversity metrics that can be used for evaluating the diversity of counterfactual sequences. We evaluate ACTER in two RL environments, with both discrete and continuous actions, and show that it can generate actionable and diverse counterfactual sequences. We conduct a user study to explore how explanations generated by ACTER help users identify and correct failure.
Paper Structure (27 sections, 9 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 9 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Lack of actionability of current counterfactual explanations. Left: ego vehicle (green) crashes with another vehicle (blue). Center and right: two possible counterfactuals stating that had the agent been in a different lane it would not have crashed. However, only the right counterfactual is actionable, as an agent could have only switched to the right lane without crashing with another car. Without examining the agent's execution history it is impossible to see how the crash could have been prevented.
  • Figure 2: Stochastic uncertainty and validity evaluation in ACTER in a farming scenario: In an episode that ends in a plant dying, an alternative sequence of actions that increases water in the first action is examined. We run simulations of the episode with an alternative sequence of actions under different stochastic conditions represented by the weather. Validity indicates whether the plant would die after the alternative sequence of actions under the sunny weather conditions present in the original episode. Stochastic uncertainty is calculated as the probability of the plant surviving over all simulated weather conditions.
  • Figure 3: Examples of user study explanations shown to the users. Left: Counterfactual explanation. Right: Non-actionable explanation.
  • Figure 4: Examples of user study questions. Left: Question about identifying action responsible for failure. Right: Question about alternative action to prevent failure.