COOL-MC: Verifying and Explaining RL Policies for Multi-bridge Network Maintenance

Dennis Gross

COOL-MC: Verifying and Explaining RL Policies for Multi-bridge Network Maintenance

Dennis Gross

TL;DR

COOL-MC is demonstrated as a tool for verifying and explaining RL policies for multi-bridge network maintenance, building on a single-bridge Markov decision process from the literature and extending it to a parallel network of three heterogeneous bridges with a shared periodic budget constraint, encoded in the PRISM modeling language.

Abstract

Aging bridge networks require proactive, verifiable, and interpretable maintenance strategies, yet reinforcement learning (RL) policies trained solely on reward signals provide no formal safety guarantees and remain opaque to infrastructure managers. We demonstrate COOL-MC as a tool for verifying and explaining RL policies for multi-bridge network maintenance, building on a single-bridge Markov decision process (MDP) from the literature and extending it to a parallel network of three heterogeneous bridges with a shared periodic budget constraint, encoded in the PRISM modeling language. We train an RL agent on this MDP and apply probabilistic model checking and explainability methods to the induced discrete-time Markov chain (DTMC) that arises from the interaction between the learned policy and the underlying MDP. Probabilistic model checking reveals that the trained policy has a safety-violation probability of 3.5\% over the planning horizon, being slightly above the theoretical minimum of 0\% and indicating the suboptimality of the learned policy, noting that these results are based on artificially constructed transition probabilities and deterioration rates rather than real-world data, so absolute performance figures should be interpreted with caution. The explainability analysis further reveals, for instance, a systematic bias in the trained policy toward the state of bridge 1 over the remaining bridges in the network. These results demonstrate COOL-MC's ability to provide formal, interpretable, and practical analysis of RL maintenance policies.

COOL-MC: Verifying and Explaining RL Policies for Multi-bridge Network Maintenance

TL;DR

Abstract

Paper Structure (32 sections, 6 equations, 3 figures, 7 tables, 1 algorithm)

This paper contains 32 sections, 6 equations, 3 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Background
Probabilistic Systems
Probabilistic Model Checking
Reinforcement Learning
Explainability Methods
Methodology
PRISM MDP Encoding
State space
Actions
Transition function
Reward function
Differences from Wei et al. (2019)
Summary
...and 17 more sections

Figures (3)

Figure 1: The multi-bridge network maintenance problem. Three heterogeneous bridges span a shared river, each rated on the NBI condition scale (green = Good, amber = Fair, red = Poor). A shared periodic budget ($B_{\max}=10$, reloaded every four years) is allocated by the RL maintenance agent, which observes all bridge conditions and issues joint maintenance actions $\pi \in \{\text{Do Nothing (DN)},\text{Minor Maintenance (MN)},\text{Major Maintenance (MJ)},\\ \text{Replacement (RP)}\}^3$ at every time step.
Figure 2: Sequential decision-making loop. The agent receives an observation and reward from the environment, selects an action according to its policy, and the environment transitions to a new state.
Figure 3: General model checking workflow DBLP:journals/sttt/HenselJKQV22. The system is formally modeled, the requirements are formalized, and both are input to a model checker such as Storm, which verifies the property.

Theorems & Definitions (4)

Definition 1: MDP
Definition 2: DTMC
Definition 3: Observation
Definition 4

COOL-MC: Verifying and Explaining RL Policies for Multi-bridge Network Maintenance

TL;DR

Abstract

COOL-MC: Verifying and Explaining RL Policies for Multi-bridge Network Maintenance

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (4)