Table of Contents
Fetching ...

Probabilistic Model Checking of Stochastic Reinforcement Learning Policies

Dennis Gross, Helge Spieker

TL;DR

This work introduces a method to verify stochastic reinforcement learning (RL) policies, leveraging a Markov decision process, a trained RL policy, and a probabilistic computation tree logic (PCTL) formula to build a formal model that can be subsequently verified via the model checker Storm.

Abstract

We introduce a method to verify stochastic reinforcement learning (RL) policies. This approach is compatible with any RL algorithm as long as the algorithm and its corresponding environment collectively adhere to the Markov property. In this setting, the future state of the environment should depend solely on its current state and the action executed, independent of any previous states or actions. Our method integrates a verification technique, referred to as model checking, with RL, leveraging a Markov decision process, a trained RL policy, and a probabilistic computation tree logic (PCTL) formula to build a formal model that can be subsequently verified via the model checker Storm. We demonstrate our method's applicability across multiple benchmarks, comparing it to baseline methods called deterministic safety estimates and naive monolithic model checking. Our results show that our method is suited to verify stochastic RL policies.

Probabilistic Model Checking of Stochastic Reinforcement Learning Policies

TL;DR

This work introduces a method to verify stochastic reinforcement learning (RL) policies, leveraging a Markov decision process, a trained RL policy, and a probabilistic computation tree logic (PCTL) formula to build a formal model that can be subsequently verified via the model checker Storm.

Abstract

We introduce a method to verify stochastic reinforcement learning (RL) policies. This approach is compatible with any RL algorithm as long as the algorithm and its corresponding environment collectively adhere to the Markov property. In this setting, the future state of the environment should depend solely on its current state and the action executed, independent of any previous states or actions. Our method integrates a verification technique, referred to as model checking, with RL, leveraging a Markov decision process, a trained RL policy, and a probabilistic computation tree logic (PCTL) formula to build a formal model that can be subsequently verified via the model checker Storm. We demonstrate our method's applicability across multiple benchmarks, comparing it to baseline methods called deterministic safety estimates and naive monolithic model checking. Our results show that our method is suited to verify stochastic RL policies.
Paper Structure (31 sections, 2 equations, 4 figures, 2 tables)

This paper contains 31 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: This diagram represents an RL system in which an agent interacts with an environment. The agent receives a state and a reward from the environment based on its previous action. The agent then uses this information to select the next action, which it sends to the environment.
  • Figure 2: The MDP $M$ with $S = \{x=1,x=2,x=3,x=4\}$, A={UP,NOP,DOWN}, and $rew \colon S \times A \rightarrow \{1\}$.
  • Figure 3: Induced MDP $\hat{M}$. We add the state-action transition to the MDP for each action with a probability greater than zero $\hat{\pi}(a|s)$. In this example, the chosen actions are UP and DOWN with the corresponding probabilities $\hat{\pi}(UP|x=1) = 0.3$ and $\hat{\pi}(DOWN|x=1) = 0.7$ at state $x=1$.
  • Figure 4: Transitions update for $\hat{D}$. For example, $\hat{Tr}_{\hat{D}}(x=1)(x=2) = Tr(x=1,UP,x=2) \cdot \hat{\pi}(UP\mid x=1) = 0.2 \cdot 0.3 = 0.06$ at state $x=1$. Repeat with step from Figure \ref{['fig:combined_actions']} for all reachable states by the trained RL policy.

Theorems & Definitions (6)

  • Definition 1: Markov Decision Process
  • Definition 2: Discrete-time Markov Chain
  • Definition 3: Deterministic Policy
  • Definition 4: Stochastic Policy
  • Definition 5: Permissive Policy
  • Definition 6: PCTL Syntax