Table of Contents
Fetching ...

Pro2Guard: Proactive Runtime Enforcement of LLM Agent Safety via Probabilistic Model Checking

Haoyu Wang, Christopher M. Poskitt, Jun Sun, Jiali Wei

TL;DR

Pro2Guard tackles the safety of LLM-powered agents by predicting multi-step risk and enforcing safety proactively. It learns a domain-specific DTMC from execution traces through predicate abstraction and Laplace smoothing, and uses CTL specifications with $(\varepsilon,\delta)$-PAC guarantees to quantify future violations. Runtime enforcement triggers risk-aware interventions when the predicted safety probability falls below a threshold $\theta$, with options for user-in-the-loop or safe-stop prompts, achieving a balance between safety and task performance. Empirical results in autonomous vehicles and embodied agents show improved safety with modest overhead and reduced manual rule-writing, demonstrating a principled, data-driven path to trustworthy LLM agents in stochastic environments.

Abstract

Large Language Model (LLM) agents demonstrate strong autonomy, but their stochastic behavior introduces unpredictable safety risks. Existing rule-based enforcement systems, such as AgentSpec, are reactive, intervening only when unsafe behavior is imminent or has occurred, lacking foresight for long-horizon dependencies. To overcome these limitations, we present a proactive runtime enforcement framework for LLM agents. The framework abstracts agent behaviors into symbolic states and learns a Discrete-Time Markov Chain (DTMC) from execution traces. At runtime, it predicts the probability of leading to undesired behaviors and intervenes before violations occur when the estimated risk exceeds a user-defined threshold. Designed to provide PAC-correctness guarantee, the framework achieves statistically reliable enforcement of agent safety. We evaluate the framework across two safety-critical domains: autonomous vehicles and embodied agents. It proactively enforces safety and maintains high task performance, outperforming existing methods.

Pro2Guard: Proactive Runtime Enforcement of LLM Agent Safety via Probabilistic Model Checking

TL;DR

Pro2Guard tackles the safety of LLM-powered agents by predicting multi-step risk and enforcing safety proactively. It learns a domain-specific DTMC from execution traces through predicate abstraction and Laplace smoothing, and uses CTL specifications with -PAC guarantees to quantify future violations. Runtime enforcement triggers risk-aware interventions when the predicted safety probability falls below a threshold , with options for user-in-the-loop or safe-stop prompts, achieving a balance between safety and task performance. Empirical results in autonomous vehicles and embodied agents show improved safety with modest overhead and reduced manual rule-writing, demonstrating a principled, data-driven path to trustworthy LLM agents in stochastic environments.

Abstract

Large Language Model (LLM) agents demonstrate strong autonomy, but their stochastic behavior introduces unpredictable safety risks. Existing rule-based enforcement systems, such as AgentSpec, are reactive, intervening only when unsafe behavior is imminent or has occurred, lacking foresight for long-horizon dependencies. To overcome these limitations, we present a proactive runtime enforcement framework for LLM agents. The framework abstracts agent behaviors into symbolic states and learns a Discrete-Time Markov Chain (DTMC) from execution traces. At runtime, it predicts the probability of leading to undesired behaviors and intervenes before violations occur when the estimated risk exceeds a user-defined threshold. Designed to provide PAC-correctness guarantee, the framework achieves statistically reliable enforcement of agent safety. We evaluate the framework across two safety-critical domains: autonomous vehicles and embodied agents. It proactively enforces safety and maintains high task performance, outperforming existing methods.

Paper Structure

This paper contains 22 sections, 1 theorem, 20 equations, 2 figures, 3 tables, 2 algorithms.

Key Result

theorem 1

For any safety CTL property $\psi$, confidence $\delta$, and error $\varepsilon$, Algorithm alg:learn_matrix_ctl is PAC-correct.

Figures (2)

  • Figure 1: A DTMC representing fork and microwave interactions, with the unsafe state $s_3$ highlighted in red. Each node represents a symbolic state, and each edge is annotated with the transition probability (ratio) between states.
  • Figure :

Theorems & Definitions (8)

  • definition 1: Problem Definition
  • definition 2: Computation Tree Logic (CTL)
  • definition 3: Discrete-Time Markov Chain (DTMC)
  • definition 4: Probably Approximately Correct (PAC-correct)
  • theorem 1: PAC-correct of Algorithm \ref{['alg:learn_matrix_ctl']}
  • definition 5: LawBreaker STL Fragment $\mathsf{STL}_{\mathsf{LB}}$ for autonomous vehicle properties
  • definition 6: Auxiliary Monitor for $k$-Bounded Response
  • proof