Table of Contents
Fetching ...

COOL-MC: Verifying and Explaining RL Policies for Multi-bridge Network Maintenance

Dennis Gross

TL;DR

COOL-MC is demonstrated as a tool for verifying and explaining RL policies for multi-bridge network maintenance, building on a single-bridge Markov decision process from the literature and extending it to a parallel network of three heterogeneous bridges with a shared periodic budget constraint, encoded in the PRISM modeling language.

Abstract

Aging bridge networks require proactive, verifiable, and interpretable maintenance strategies, yet reinforcement learning (RL) policies trained solely on reward signals provide no formal safety guarantees and remain opaque to infrastructure managers. We demonstrate COOL-MC as a tool for verifying and explaining RL policies for multi-bridge network maintenance, building on a single-bridge Markov decision process (MDP) from the literature and extending it to a parallel network of three heterogeneous bridges with a shared periodic budget constraint, encoded in the PRISM modeling language. We train an RL agent on this MDP and apply probabilistic model checking and explainability methods to the induced discrete-time Markov chain (DTMC) that arises from the interaction between the learned policy and the underlying MDP. Probabilistic model checking reveals that the trained policy has a safety-violation probability of 3.5\% over the planning horizon, being slightly above the theoretical minimum of 0\% and indicating the suboptimality of the learned policy, noting that these results are based on artificially constructed transition probabilities and deterioration rates rather than real-world data, so absolute performance figures should be interpreted with caution. The explainability analysis further reveals, for instance, a systematic bias in the trained policy toward the state of bridge 1 over the remaining bridges in the network. These results demonstrate COOL-MC's ability to provide formal, interpretable, and practical analysis of RL maintenance policies.

COOL-MC: Verifying and Explaining RL Policies for Multi-bridge Network Maintenance

TL;DR

COOL-MC is demonstrated as a tool for verifying and explaining RL policies for multi-bridge network maintenance, building on a single-bridge Markov decision process from the literature and extending it to a parallel network of three heterogeneous bridges with a shared periodic budget constraint, encoded in the PRISM modeling language.

Abstract

Aging bridge networks require proactive, verifiable, and interpretable maintenance strategies, yet reinforcement learning (RL) policies trained solely on reward signals provide no formal safety guarantees and remain opaque to infrastructure managers. We demonstrate COOL-MC as a tool for verifying and explaining RL policies for multi-bridge network maintenance, building on a single-bridge Markov decision process (MDP) from the literature and extending it to a parallel network of three heterogeneous bridges with a shared periodic budget constraint, encoded in the PRISM modeling language. We train an RL agent on this MDP and apply probabilistic model checking and explainability methods to the induced discrete-time Markov chain (DTMC) that arises from the interaction between the learned policy and the underlying MDP. Probabilistic model checking reveals that the trained policy has a safety-violation probability of 3.5\% over the planning horizon, being slightly above the theoretical minimum of 0\% and indicating the suboptimality of the learned policy, noting that these results are based on artificially constructed transition probabilities and deterioration rates rather than real-world data, so absolute performance figures should be interpreted with caution. The explainability analysis further reveals, for instance, a systematic bias in the trained policy toward the state of bridge 1 over the remaining bridges in the network. These results demonstrate COOL-MC's ability to provide formal, interpretable, and practical analysis of RL maintenance policies.
Paper Structure (32 sections, 6 equations, 3 figures, 7 tables, 1 algorithm)

This paper contains 32 sections, 6 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: The multi-bridge network maintenance problem. Three heterogeneous bridges span a shared river, each rated on the NBI condition scale (green = Good, amber = Fair, red = Poor). A shared periodic budget ($B_{\max}=10$, reloaded every four years) is allocated by the RL maintenance agent, which observes all bridge conditions and issues joint maintenance actions $\pi \in \{\text{Do Nothing (DN)},\text{Minor Maintenance (MN)},\text{Major Maintenance (MJ)},\\ \text{Replacement (RP)}\}^3$ at every time step.
  • Figure 2: Sequential decision-making loop. The agent receives an observation and reward from the environment, selects an action according to its policy, and the environment transitions to a new state.
  • Figure 3: General model checking workflow DBLP:journals/sttt/HenselJKQV22. The system is formally modeled, the requirements are formalized, and both are input to a model checker such as Storm, which verifies the property.

Theorems & Definitions (4)

  • Definition 1: MDP
  • Definition 2: DTMC
  • Definition 3: Observation
  • Definition 4