Table of Contents
Fetching ...

REACT: Revealing Evolutionary Action Consequence Trajectories for Interpretable Reinforcement Learning

Philipp Altmann, Céline Davignon, Maximilian Zorn, Fabian Ritz, Claudia Linnhoff-Popien, Thomas Gabor

TL;DR

REACT tackles the interpretability gap in reinforcement learning by generating a diverse set of edge-case demonstrations through controlled disturbances of the initial state and an evolutionary search over initial conditions. It introduces a joint fitness that combines local trajectory diversity, global diversity across demonstrations, and action-certainty to guide the evolution of informative demonstrations. Across gridworld and continuous-control tasks, REACT unveils nuanced policy behaviors not apparent from optimal trajectories alone and demonstrates robustness in revealing potential vulnerabilities. This approach provides a practical, model-agnostic tool for human-in-the-loop policy inspection and could inform adversarial curricula or causality-based interpretability analyses in real-world RL deployments.

Abstract

To enhance the interpretability of Reinforcement Learning (RL), we propose Revealing Evolutionary Action Consequence Trajectories (REACT). In contrast to the prevalent practice of validating RL models based on their optimal behavior learned during training, we posit that considering a range of edge-case trajectories provides a more comprehensive understanding of their inherent behavior. To induce such scenarios, we introduce a disturbance to the initial state, optimizing it through an evolutionary algorithm to generate a diverse population of demonstrations. To evaluate the fitness of trajectories, REACT incorporates a joint fitness function that encourages both local and global diversity in the encountered states and chosen actions. Through assessments with policies trained for varying durations in discrete and continuous environments, we demonstrate the descriptive power of REACT. Our results highlight its effectiveness in revealing nuanced aspects of RL models' behavior beyond optimal performance, thereby contributing to improved interpretability.

REACT: Revealing Evolutionary Action Consequence Trajectories for Interpretable Reinforcement Learning

TL;DR

REACT tackles the interpretability gap in reinforcement learning by generating a diverse set of edge-case demonstrations through controlled disturbances of the initial state and an evolutionary search over initial conditions. It introduces a joint fitness that combines local trajectory diversity, global diversity across demonstrations, and action-certainty to guide the evolution of informative demonstrations. Across gridworld and continuous-control tasks, REACT unveils nuanced policy behaviors not apparent from optimal trajectories alone and demonstrates robustness in revealing potential vulnerabilities. This approach provides a practical, model-agnostic tool for human-in-the-loop policy inspection and could inform adversarial curricula or causality-based interpretability analyses in real-world RL deployments.

Abstract

To enhance the interpretability of Reinforcement Learning (RL), we propose Revealing Evolutionary Action Consequence Trajectories (REACT). In contrast to the prevalent practice of validating RL models based on their optimal behavior learned during training, we posit that considering a range of edge-case trajectories provides a more comprehensive understanding of their inherent behavior. To induce such scenarios, we introduce a disturbance to the initial state, optimizing it through an evolutionary algorithm to generate a diverse population of demonstrations. To evaluate the fitness of trajectories, REACT incorporates a joint fitness function that encourages both local and global diversity in the encountered states and chosen actions. Through assessments with policies trained for varying durations in discrete and continuous environments, we demonstrate the descriptive power of REACT. Our results highlight its effectiveness in revealing nuanced aspects of RL models' behavior beyond optimal performance, thereby contributing to improved interpretability.
Paper Structure (33 sections, 15 equations, 12 figures, 1 algorithm)

This paper contains 33 sections, 15 equations, 12 figures, 1 algorithm.

Figures (12)

  • Figure 1: Joint fitness $\mathcal{F}$ elements local diversity$\mathcal{D}_l$ (light blue), global diversity$\mathcal{D}_g$ (blue), and certainty$\mathcal{C}$ (orange), compared to an exemplary optimal trajectory (white).
  • Figure 2: REACT Architecture
  • Figure 3: REACT Evaluation: Comparison of the Final Return \ref{['fig:FlatGrid-Return']} and Length \ref{['fig:FlatGrid-Length']} of Random \ref{['fig:RandomT-FlatGrid']} and REACT \ref{['fig:ReactT-FlatGrid']} demonstrations of a PPO policy trained for 35k steps in the FlatGrid11 \ref{['fig:FlatGrid11']}. The training performance in the unaltered environment is displayed by a solid line. Overall, the plots convey the increased diversity and even distribution of REACT-generated demonstrations over random or static initial states.
  • Figure 4: FlatGrid JointFitness Analysis
  • Figure 5: HoleyGrid Evaluation: Comparison of the Final Return \ref{['fig:HoleyGrid-Return']} and Length \ref{['fig:HoleyGrid-Length']} of Random \ref{['fig:RandomT-HoleyGrid']} and REACT \ref{['fig:ReactT-HoleyGrid']} demonstrations of a PPO policy trained for 150k steps in the HoleyGrid11 \ref{['fig:HoelyGrid11']}. The training performance in the unaltered environment is displayed by a solid line. The plots indicate further edge-case demonstration being generated using REACT over random or static initial states.
  • ...and 7 more figures