Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks
Timon Sachweh, Pierre Haritz, Thomas Liebig
TL;DR
This work tackles trust and safety in reinforcement learning for real‑world control by integrating Petri nets as a constraint mechanism that augments state information, restricts actions, and enables verifiability. The authors formalize a State‑Enhanced CMDP (SE‑MDP) and introduce the RLPN architecture with a PN‑driven constraint update, yielding the PN‑CDQN algorithm that enforces PN constraints within the Q‑learning loop. Empirical evaluation on a four‑way traffic junction shows PN‑CDQN outperforms cycle‑based baselines and a vanilla DQN, achieving zero constraint violations and favorable AJWT metrics while maintaining robust learning progression. The approach offers verifiable, constraint‑aware RL suitable for process‑model‑based domains and promises broad applicability, with future work on decentralized MARL, privacy in communication, and extensions to Colored/Timed PNs.
Abstract
The lack of trust in algorithms is usually an issue when using Reinforcement Learning (RL) agents for control in real-world domains such as production plants, autonomous vehicles, or traffic-related infrastructure, partly due to the lack of verifiability of the model itself. In such scenarios, Petri nets (PNs) are often available for flowcharts or process steps, as they are versatile and standardized. In order to facilitate integration of RL models and as a step towards increasing AI trustworthiness, we propose an approach that uses PNs with three main advantages over typical RL approaches: Firstly, the agent can now easily be modeled with a combined state including both external environmental observations and agent-specific state information from a given PN. Secondly, we can enforce constraints for state-dependent actions through the inherent PN model. And lastly, we can increase trustworthiness by verifying PN properties through techniques such as model checking. We test our approach on a typical four-way intersection traffic light control setting and present our results, beating cycle-based baselines.
