Table of Contents
Fetching ...

Visualizing and Understanding Atari Agents

Sam Greydanus, Anurag Koul, Jonathan Dodge, Alan Fern

TL;DR

The paper addresses the opacity of deep RL agents by introducing a perturbation-based saliency framework that generates interpretable saliency maps for both policy and value networks in vision-based Atari agents. Using A3C-trained agents across six games, it reveals how attention shifts during learning, and it demonstrates the method’s utility for diagnosing robust strategies, detecting overfitting, and debugging underperforming policies. It also shows that saliency can help non-experts reason about agent behavior and discusses memory's role in decision-making. The work represents a step toward human-friendly explanations in deep RL, while highlighting that a combination of explanatory tools will be needed for comprehensive understanding and trust.

Abstract

While deep reinforcement learning (deep RL) agents are effective at maximizing rewards, it is often unclear what strategies they use to do so. In this paper, we take a step toward explaining deep RL agents through a case study using Atari 2600 environments. In particular, we focus on using saliency maps to understand how an agent learns and executes a policy. We introduce a method for generating useful saliency maps and use it to show 1) what strong agents attend to, 2) whether agents are making decisions for the right or wrong reasons, and 3) how agents evolve during learning. We also test our method on non-expert human subjects and find that it improves their ability to reason about these agents. Overall, our results show that saliency information can provide significant insight into an RL agent's decisions and learning behavior.

Visualizing and Understanding Atari Agents

TL;DR

The paper addresses the opacity of deep RL agents by introducing a perturbation-based saliency framework that generates interpretable saliency maps for both policy and value networks in vision-based Atari agents. Using A3C-trained agents across six games, it reveals how attention shifts during learning, and it demonstrates the method’s utility for diagnosing robust strategies, detecting overfitting, and debugging underperforming policies. It also shows that saliency can help non-experts reason about agent behavior and discusses memory's role in decision-making. The work represents a step toward human-friendly explanations in deep RL, while highlighting that a combination of explanatory tools will be needed for comprehensive understanding and trust.

Abstract

While deep reinforcement learning (deep RL) agents are effective at maximizing rewards, it is often unclear what strategies they use to do so. In this paper, we take a step toward explaining deep RL agents through a case study using Atari 2600 environments. In particular, we focus on using saliency maps to understand how an agent learns and executes a policy. We introduce a method for generating useful saliency maps and use it to show 1) what strong agents attend to, 2) whether agents are making decisions for the right or wrong reasons, and 3) how agents evolve during learning. We also test our method on non-expert human subjects and find that it improves their ability to reason about these agents. Overall, our results show that saliency information can provide significant insight into an RL agent's decisions and learning behavior.

Paper Structure

This paper contains 12 sections, 3 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Comparison of Jacobian saliency to our perturbation-based approach. We are visualizing an actor-critic model Mnih2016AsynchronousLearning. Red indicates saliency for the critic; blue is saliency for the actor.
  • Figure 2: An example of how our perturbation method selectively blurs a region, applied to a cropped frame of Breakout
  • Figure 3: Visualizing strong Atari 2600 policies. We use an actor-critic network; the actor's saliency map is blue and the critic's saliency map is red. White arrows denote motion of the ball.
  • Figure 4: Visualizing learning. Frames are chosen from games played by fully-trained agents. Leftmost agents are untrained, rightmost agents are fully trained. Each column is separated by ten million frames of training. White arrows denote the velocity of the ball.
  • Figure 5: Visualizing overfit Atari policies. Grey boxes denote the hint pixels. White arrows denote motion of the ball.
  • ...and 2 more figures