Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks

Timon Sachweh; Pierre Haritz; Thomas Liebig

Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks

Timon Sachweh, Pierre Haritz, Thomas Liebig

TL;DR

This work tackles trust and safety in reinforcement learning for real‑world control by integrating Petri nets as a constraint mechanism that augments state information, restricts actions, and enables verifiability. The authors formalize a State‑Enhanced CMDP (SE‑MDP) and introduce the RLPN architecture with a PN‑driven constraint update, yielding the PN‑CDQN algorithm that enforces PN constraints within the Q‑learning loop. Empirical evaluation on a four‑way traffic junction shows PN‑CDQN outperforms cycle‑based baselines and a vanilla DQN, achieving zero constraint violations and favorable AJWT metrics while maintaining robust learning progression. The approach offers verifiable, constraint‑aware RL suitable for process‑model‑based domains and promises broad applicability, with future work on decentralized MARL, privacy in communication, and extensions to Colored/Timed PNs.

Abstract

The lack of trust in algorithms is usually an issue when using Reinforcement Learning (RL) agents for control in real-world domains such as production plants, autonomous vehicles, or traffic-related infrastructure, partly due to the lack of verifiability of the model itself. In such scenarios, Petri nets (PNs) are often available for flowcharts or process steps, as they are versatile and standardized. In order to facilitate integration of RL models and as a step towards increasing AI trustworthiness, we propose an approach that uses PNs with three main advantages over typical RL approaches: Firstly, the agent can now easily be modeled with a combined state including both external environmental observations and agent-specific state information from a given PN. Secondly, we can enforce constraints for state-dependent actions through the inherent PN model. And lastly, we can increase trustworthiness by verifying PN properties through techniques such as model checking. We test our approach on a typical four-way intersection traffic light control setting and present our results, beating cycle-based baselines.

Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks

TL;DR

Abstract

Paper Structure (30 sections, 8 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 30 sections, 8 equations, 7 figures, 1 table, 1 algorithm.

Introduction
Preliminaries
Petri nets
Reinforcement Learning
Markov Decision Processes
Constrained Markov Decision Processes
Deep Q Learning
Methodology
Problem Setting Formulation
RLPN Architecture
Constraint Enforcement
Gymnasium Environment Wrapper
Environment Definition and Parsing
Experiments
Experimental Setup
...and 15 more sections

Figures (7)

Figure 1: Proposed Constrained Reinforcement Learning closed-loop design. The dynamic wrapper contains both the environment and the Petri net and processes inputs from and outputs to the agent.
Figure 2: Junction scenario
Figure 3: Petri net for traffic lights constraints. North (n), East (e), South (s), and West (w) denote the lane directions.
Figure 4: 2 training executions for each DQN and PN-CDQN agent. Best runs, based on simulation timestep are chosen from all 256 training runs.
Figure 5: Evaluation results of reached simulation timesteps for intermediate trained agents. Each data point represents a training step increase of 100,000 steps. The two highest-scoring agents of both DQN and PN-CDQN are shown.
...and 2 more figures

Theorems & Definitions (2)

Definition 1
Definition 2

Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks

TL;DR

Abstract

Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (2)