Table of Contents
Fetching ...

Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning

Charlotte Beylier, Hannah Selder, Arthur Fleig, Simon M. Hofmann, Nico Scherf

TL;DR

<3-5 sentence high-level summary>

Abstract

While deep reinforcement learning agents demonstrate high performance across domains, their internal decision processes remain difficult to interpret when evaluated only through performance metrics. In particular, it is poorly understood which input features agents rely on, how these dependencies evolve during training, and how they relate to behavior. We introduce a scientific methodology for analyzing the learning process through quantitative analysis of saliency. This approach aggregates saliency information at the object and modality level into hierarchical attention profiles, quantifying how agents allocate attention over time, thereby forming attention trajectories throughout training. Applied to Atari benchmarks, custom Pong environments, and muscle-actuated biomechanical user simulations in visuomotor interactive tasks, this methodology uncovers algorithm-specific attention biases, reveals unintended reward-driven strategies, and diagnoses overfitting to redundant sensory channels. These patterns correspond to measurable behavioral differences, demonstrating empirical links between attention profiles, learning dynamics, and agent behavior. To assess robustness of the attention profiles, we validate our findings across multiple saliency methods and environments. The results establish attention trajectories as a promising diagnostic axis for tracing how feature reliance develops during training and for identifying biases and vulnerabilities invisible to performance metrics alone.

Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning

TL;DR

<3-5 sentence high-level summary>

Abstract

While deep reinforcement learning agents demonstrate high performance across domains, their internal decision processes remain difficult to interpret when evaluated only through performance metrics. In particular, it is poorly understood which input features agents rely on, how these dependencies evolve during training, and how they relate to behavior. We introduce a scientific methodology for analyzing the learning process through quantitative analysis of saliency. This approach aggregates saliency information at the object and modality level into hierarchical attention profiles, quantifying how agents allocate attention over time, thereby forming attention trajectories throughout training. Applied to Atari benchmarks, custom Pong environments, and muscle-actuated biomechanical user simulations in visuomotor interactive tasks, this methodology uncovers algorithm-specific attention biases, reveals unintended reward-driven strategies, and diagnoses overfitting to redundant sensory channels. These patterns correspond to measurable behavioral differences, demonstrating empirical links between attention profiles, learning dynamics, and agent behavior. To assess robustness of the attention profiles, we validate our findings across multiple saliency methods and environments. The results establish attention trajectories as a promising diagnostic axis for tracing how feature reliance develops during training and for identifying biases and vulnerabilities invisible to performance metrics alone.

Paper Structure

This paper contains 70 sections, 16 equations, 23 figures, 4 tables.

Figures (23)

  • Figure 1: Methodology for measuring the hierarchical attention profiles and relating it to agent behavior.a. Illustration of the $h$-profile computation. Saliency maps are derived from the penultimate layer using Layer-wise Relevance Propagation (LRP), a first application of LRP extracts the relevance of the neurons in the $F_c$ layer followed by a second application of LRP from neurons in $F_c$ to extract the relevance in the input. The latter is then aggregated into object-level scores that will be used to derive the attention profile $h$. b. Example of $h$-profile dynamics during training, showing how attention toward different objects evolves. c. Case study 1: comparison of $h$-profile profiles across learning algorithms in an Atari environment (Breakout shown). Behavioral control experiment assessing algorithm robustness to visual perturbations (brick occlusion) (right figure). d. Case study 2: hierarchical attention patterns under different reward functions and game strategies in custom Pong environments (rewarding ball in red). Dual Ball Discrimination Test measuring behavioral preferences for ball interactions (right figure). e. Case study 3: hierarchical attention across sensory modalities (vision and proprioception) in two visuomotor tasks modeled by a muscle-driven biomechanical model of the human upper extremity: pressing the button matching the displayed color, and using a joystick to park a car. The perturbation experiment, in which button positions are shifted during training, is indicated by the green arrow.
  • Figure 2: Attention dynamics reveal algorithm-specific differences in Atari games.a. Breakout game. b. Breakout training curves show performance (top) and hierarchical attention profiles (bottom), tracking the proportion of attention allocated to game objects (ball, paddle, bricks, score agent (S.A)). Early convergence toward the ball is followed by algorithm-specific divergence, with DQN and QR-DQN reallocating substantial attention to bricks. c. Robustness under perturbations: modifying brick color severely degrades DQN/QR-DQN performance ($\Delta r \rightarrow -1$), while A2C and PPO remain robust; altering irrelevant features (score display, wall) has little effect on all algorithms. d. Learning curves across four Atari games and four learning algorithms. Standard deviation is represented by a shaded area. e. Dissimilarity analysis shows attention profiles are significantly more similar within algorithms than between algorithms (ANOSIM, ** indicates $p<0.01$).
  • Figure 3: Reward structure shape distinct attention allocation in Custom Pong.a. Training curves show performance (top) and hierarchical attention profiles (bottom), tracking the proportion of attention allocated to game objects (Ball 1, Ball 2, Agent, Opponent). Agents in the Distractor (v1) version focused primarily on the rewarding ball (B1), whereas agents in the Dual-ball condition (v2) shifted attention to B2. Notably, v1 agents continued to allocate attention to the distractor ball (B2). b. Pong environments with one ball (left) and two balls (right). The white and yellow balls denote B1 and B2, respectively. c. Hierarchical-profile $h$ at the end of training for the balls (B), the agent's paddle (A), the opponent's paddle (O), the score of the agent displayed (S.A), and the score of the opponent displayed (S.O). The bar represents the average over 50 agents, the standard deviation is represented as an error bar. d. Dissimilarity analysis on the h-profile of trained agents from v1 and v2 shows that attention profiles are significantly more similar within game version than between (ANOSIM, ** indicates $p<0.01$). e. In a dual-ball discrimination test, the ball receiving more attention was also the one most frequently interacted with (v0 was added as a reference).
  • Figure 4: Dynamic reallocation of attention across modalities.a. Parking a remote control car task. b. Choice reaction task. c. and d. Performance (top), task success (middle: % parked or % correct button presses) (middle) and hierarchical attention profiles (bottom) during training for the parking a remote control car task c and the choice reaction task d. In both tasks, attention to vision increases when reward increased due to visually guided task completion (dotted line). Lines show mean ± SD over 10 agents. e. Agents trained on the choice reaction task with moving buttons pay more attention to the displayed buttons than those trained with static buttons. f. A t-test confirmed the significant difference between those two groups. g. Training performance was higher for agents with static buttons, reflecting the greater difficulty of the moving-button condition.
  • Figure 5: Illustration of extracting the relevance score for the ball object from neuron $k$ in the $F_c$ layer. Only one of the four input frames is shown for clarity. The first LRP pass ($LRP_1$) computes neuron relevance, and the second ($LRP_2$) computes object relevance with respect to neuron $k$.
  • ...and 18 more figures