Table of Contents
Fetching ...

Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies

Dennis Gross, Helge Spieker

TL;DR

This paper tackles safety and interpretability in deep reinforcement learning by fusing policy model checking with co-activation graph analysis. It generates safety-labeled state datasets via probabilistic model checking and applies neuron-activation graphs to extract global-to-local explanations of safe decision-making, yielding semi-global insights. Key contributions include a practical methodology for labeling RL states using $PCTL$-based verification, and demonstrating how PageRank and Louvain community detection reveal influential neurons and modular structure tied to safety properties. The approach provides a scalable path to understand and trust RL policies in safety-critical contexts, with empirical validation in taxi and cleaning-robot domains. Future work points to extending this framework to multi-agent settings and safe neural pruning.

Abstract

Deep reinforcement learning (RL) policies can demonstrate unsafe behaviors and are challenging to interpret. To address these challenges, we combine RL policy model checking--a technique for determining whether RL policies exhibit unsafe behaviors--with co-activation graph analysis--a method that maps neural network inner workings by analyzing neuron activation patterns--to gain insight into the safe RL policy's sequential decision-making. This combination lets us interpret the RL policy's inner workings for safe decision-making. We demonstrate its applicability in various experiments.

Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies

TL;DR

This paper tackles safety and interpretability in deep reinforcement learning by fusing policy model checking with co-activation graph analysis. It generates safety-labeled state datasets via probabilistic model checking and applies neuron-activation graphs to extract global-to-local explanations of safe decision-making, yielding semi-global insights. Key contributions include a practical methodology for labeling RL states using -based verification, and demonstrating how PageRank and Louvain community detection reveal influential neurons and modular structure tied to safety properties. The approach provides a scalable path to understand and trust RL policies in safety-critical contexts, with empirical validation in taxi and cleaning-robot domains. Future work points to extending this framework to multi-agent settings and safe neural pruning.

Abstract

Deep reinforcement learning (RL) policies can demonstrate unsafe behaviors and are challenging to interpret. To address these challenges, we combine RL policy model checking--a technique for determining whether RL policies exhibit unsafe behaviors--with co-activation graph analysis--a method that maps neural network inner workings by analyzing neuron activation patterns--to gain insight into the safe RL policy's sequential decision-making. This combination lets us interpret the RL policy's inner workings for safe decision-making. We demonstrate its applicability in various experiments.
Paper Structure (36 sections, 5 equations, 4 figures)

This paper contains 36 sections, 5 equations, 4 figures.

Figures (4)

  • Figure 1: Model checking workflow DBLP:journals/sttt/HenselJKQV22. First, the system needs to be formally modeled, for instance, via PRISM. Then, the requirements are formalized, for instance, via PCTL. Eventually, both are inputted into the model checker, like Storm, which verifies the property.
  • Figure 2: This diagram represents an RL system in which an agent interacts with an environment. The agent receives a state and a reward from the environment based on its previous action. The agent then uses this information to select the next action, which it sends to the environment.
  • Figure 3: The 50 most significant neurons identified for each safety property.
  • Figure 4: The 50 most significant neurons for the safety property $P_{=1}(\lozenge jobs=2)$ identified for critical and non-critical states.

Theorems & Definitions (2)

  • Definition 1: MDP
  • Definition 2