Table of Contents
Fetching ...

Integrating Policy Summaries with Reward Decomposition for Explaining Reinforcement Learning Agents

Yael Septon, Tobias Huber, Elisabeth André, Ofra Amir

TL;DR

This work tackles the explainability of reinforcement learning agents by integrating local reward decomposition (RD) with global policy summaries (HIGHLIGHTS). RD reveals which reward components drive specific decisions, while HIGHLIGHTS conveys the agent’s overall strategy through selected, important states. Two user studies across Highway and Pac-Man environments show that RD substantially improves users’ ability to infer agent priorities, with HIGHLIGHTS providing additional benefits in certain contexts; the combination can help when reward preferences are similar but does not consistently outperform RD alone. The findings underscore the value of coupling faithful, component-wise explanations with high-level summaries to support human understanding of RL agents in sequential decision-making tasks, and point to future work on expanding explanation suites and adapting to domain-specific complexities.

Abstract

Explaining the behavior of reinforcement learning agents operating in sequential decision-making settings is challenging, as their behavior is affected by a dynamic environment and delayed rewards. Methods that help users understand the behavior of such agents can roughly be divided into local explanations that analyze specific decisions of the agents and global explanations that convey the general strategy of the agents. In this work, we study a novel combination of local and global explanations for reinforcement learning agents. Specifically, we combine reward decomposition, a local explanation method that exposes which components of the reward function influenced a specific decision, and HIGHLIGHTS, a global explanation method that shows a summary of the agent's behavior in decisive states. We conducted two user studies to evaluate the integration of these explanation methods and their respective benefits. Our results show significant benefits for both methods. In general, we found that the local reward decomposition was more useful for identifying the agents' priorities. However, when there was only a minor difference between the agents' preferences, then the global information provided by HIGHLIGHTS additionally improved participants' understanding.

Integrating Policy Summaries with Reward Decomposition for Explaining Reinforcement Learning Agents

TL;DR

This work tackles the explainability of reinforcement learning agents by integrating local reward decomposition (RD) with global policy summaries (HIGHLIGHTS). RD reveals which reward components drive specific decisions, while HIGHLIGHTS conveys the agent’s overall strategy through selected, important states. Two user studies across Highway and Pac-Man environments show that RD substantially improves users’ ability to infer agent priorities, with HIGHLIGHTS providing additional benefits in certain contexts; the combination can help when reward preferences are similar but does not consistently outperform RD alone. The findings underscore the value of coupling faithful, component-wise explanations with high-level summaries to support human understanding of RL agents in sequential decision-making tasks, and point to future work on expanding explanation suites and adapting to domain-specific complexities.

Abstract

Explaining the behavior of reinforcement learning agents operating in sequential decision-making settings is challenging, as their behavior is affected by a dynamic environment and delayed rewards. Methods that help users understand the behavior of such agents can roughly be divided into local explanations that analyze specific decisions of the agents and global explanations that convey the general strategy of the agents. In this work, we study a novel combination of local and global explanations for reinforcement learning agents. Specifically, we combine reward decomposition, a local explanation method that exposes which components of the reward function influenced a specific decision, and HIGHLIGHTS, a global explanation method that shows a summary of the agent's behavior in decisive states. We conducted two user studies to evaluate the integration of these explanation methods and their respective benefits. Our results show significant benefits for both methods. In general, we found that the local reward decomposition was more useful for identifying the agents' priorities. However, when there was only a minor difference between the agents' preferences, then the global information provided by HIGHLIGHTS additionally improved participants' understanding.
Paper Structure (18 sections, 12 figures, 4 tables)

This paper contains 18 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: A screenshot from the experiment that used the Highway environment. The upper part of the image shows a specific state ("Scenario 2") extracted from an agent's behavior. The agent controls the green vehicle. The bottom part shows the reward bars corresponding to the state shown above. For each action (shown on the x-axis) the Q-values of the different reward components (depicted in different colors) are shown (y-axis). In this case, it can be observed that the "change lane" component is the largest reward component affecting the behavior of this agent in this state. Users could switch to different states by choosing a scenario from the list. The states (scenarios) were chosen based on the summary method (HIGHLIGHTS or frequency-based). For conditions without local explanation, the reward bars were omitted and each scenario showed a short video.
  • Figure 2: A screenshot from the experiment that used the Pacman environment. The upper part of the image shows a specific state extracted from an agent's behavior. The bottom part shows the reward bars corresponding to the state shown above. For the action with the highest Q-value the Q-values of the different reward components (depicted in different colors) are shown (y-axis). In this case, it can be observed that the "eating normal pill" component is the largest reward component affecting the behavior of this agent in this state. Users could switch to different states by choosing a scenario from the list. The states (scenarios) were chosen based on the summary method (HIGHLIGHTS or frequency-based). For conditions without local explanation, the reward bars were omitted and each scenario showed a short video.
  • Figure 3: Participants' mean success rate in identifying the preferences averaged over all agents by condition in the Highway environment (A) and in the Pacman environment (B). The error bars show the 95% CI.
  • Figure 4: Participants' explanation satisfaction by condition in the Highway environment (A) and in the Pacman environment (B). The error bars show the 95% CI.
  • Figure :
  • ...and 7 more figures