Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies

Dennis Gross; Helge Spieker

Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies

Dennis Gross, Helge Spieker

TL;DR

This paper tackles safety and interpretability in deep reinforcement learning by fusing policy model checking with co-activation graph analysis. It generates safety-labeled state datasets via probabilistic model checking and applies neuron-activation graphs to extract global-to-local explanations of safe decision-making, yielding semi-global insights. Key contributions include a practical methodology for labeling RL states using $PCTL$-based verification, and demonstrating how PageRank and Louvain community detection reveal influential neurons and modular structure tied to safety properties. The approach provides a scalable path to understand and trust RL policies in safety-critical contexts, with empirical validation in taxi and cleaning-robot domains. Future work points to extending this framework to multi-agent settings and safe neural pruning.

Abstract

Deep reinforcement learning (RL) policies can demonstrate unsafe behaviors and are challenging to interpret. To address these challenges, we combine RL policy model checking--a technique for determining whether RL policies exhibit unsafe behaviors--with co-activation graph analysis--a method that maps neural network inner workings by analyzing neuron activation patterns--to gain insight into the safe RL policy's sequential decision-making. This combination lets us interpret the RL policy's inner workings for safe decision-making. We demonstrate its applicability in various experiments.

Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies

TL;DR

-based verification, and demonstrating how PageRank and Louvain community detection reveal influential neurons and modular structure tied to safety properties. The approach provides a scalable path to understand and trust RL policies in safety-critical contexts, with empirical validation in taxi and cleaning-robot domains. Future work points to extending this framework to multi-agent settings and safe neural pruning.

Abstract

Paper Structure (36 sections, 5 equations, 4 figures)

This paper contains 36 sections, 5 equations, 4 figures.

INTRODUCTION
Related Work
Explainable NN Methods
Formal Verification of RL Policies
Formal Verification and Explainability
Background
Probabilistic Model Checking
Explainable Reinforcement Learning
Critical state
Co-activation Graph Analysis
PageRank.
Louvain Community Detection.
Methodolodgy
Labeled Dataset Generation
Option 1
...and 21 more sections

Figures (4)

Figure 1: Model checking workflow DBLP:journals/sttt/HenselJKQV22. First, the system needs to be formally modeled, for instance, via PRISM. Then, the requirements are formalized, for instance, via PCTL. Eventually, both are inputted into the model checker, like Storm, which verifies the property.
Figure 2: This diagram represents an RL system in which an agent interacts with an environment. The agent receives a state and a reward from the environment based on its previous action. The agent then uses this information to select the next action, which it sends to the environment.
Figure 3: The 50 most significant neurons identified for each safety property.
Figure 4: The 50 most significant neurons for the safety property $P_{=1}(\lozenge jobs=2)$ identified for critical and non-critical states.

Theorems & Definitions (2)

Definition 1: MDP
Definition 2

Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies

TL;DR

Abstract

Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (2)