Table of Contents
Fetching ...

Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks

Timon Sachweh, Pierre Haritz, Thomas Liebig

TL;DR

This work tackles trust and safety in reinforcement learning for real‑world control by integrating Petri nets as a constraint mechanism that augments state information, restricts actions, and enables verifiability. The authors formalize a State‑Enhanced CMDP (SE‑MDP) and introduce the RLPN architecture with a PN‑driven constraint update, yielding the PN‑CDQN algorithm that enforces PN constraints within the Q‑learning loop. Empirical evaluation on a four‑way traffic junction shows PN‑CDQN outperforms cycle‑based baselines and a vanilla DQN, achieving zero constraint violations and favorable AJWT metrics while maintaining robust learning progression. The approach offers verifiable, constraint‑aware RL suitable for process‑model‑based domains and promises broad applicability, with future work on decentralized MARL, privacy in communication, and extensions to Colored/Timed PNs.

Abstract

The lack of trust in algorithms is usually an issue when using Reinforcement Learning (RL) agents for control in real-world domains such as production plants, autonomous vehicles, or traffic-related infrastructure, partly due to the lack of verifiability of the model itself. In such scenarios, Petri nets (PNs) are often available for flowcharts or process steps, as they are versatile and standardized. In order to facilitate integration of RL models and as a step towards increasing AI trustworthiness, we propose an approach that uses PNs with three main advantages over typical RL approaches: Firstly, the agent can now easily be modeled with a combined state including both external environmental observations and agent-specific state information from a given PN. Secondly, we can enforce constraints for state-dependent actions through the inherent PN model. And lastly, we can increase trustworthiness by verifying PN properties through techniques such as model checking. We test our approach on a typical four-way intersection traffic light control setting and present our results, beating cycle-based baselines.

Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks

TL;DR

This work tackles trust and safety in reinforcement learning for real‑world control by integrating Petri nets as a constraint mechanism that augments state information, restricts actions, and enables verifiability. The authors formalize a State‑Enhanced CMDP (SE‑MDP) and introduce the RLPN architecture with a PN‑driven constraint update, yielding the PN‑CDQN algorithm that enforces PN constraints within the Q‑learning loop. Empirical evaluation on a four‑way traffic junction shows PN‑CDQN outperforms cycle‑based baselines and a vanilla DQN, achieving zero constraint violations and favorable AJWT metrics while maintaining robust learning progression. The approach offers verifiable, constraint‑aware RL suitable for process‑model‑based domains and promises broad applicability, with future work on decentralized MARL, privacy in communication, and extensions to Colored/Timed PNs.

Abstract

The lack of trust in algorithms is usually an issue when using Reinforcement Learning (RL) agents for control in real-world domains such as production plants, autonomous vehicles, or traffic-related infrastructure, partly due to the lack of verifiability of the model itself. In such scenarios, Petri nets (PNs) are often available for flowcharts or process steps, as they are versatile and standardized. In order to facilitate integration of RL models and as a step towards increasing AI trustworthiness, we propose an approach that uses PNs with three main advantages over typical RL approaches: Firstly, the agent can now easily be modeled with a combined state including both external environmental observations and agent-specific state information from a given PN. Secondly, we can enforce constraints for state-dependent actions through the inherent PN model. And lastly, we can increase trustworthiness by verifying PN properties through techniques such as model checking. We test our approach on a typical four-way intersection traffic light control setting and present our results, beating cycle-based baselines.
Paper Structure (30 sections, 8 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 30 sections, 8 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Proposed Constrained Reinforcement Learning closed-loop design. The dynamic wrapper contains both the environment and the Petri net and processes inputs from and outputs to the agent.
  • Figure 2: Junction scenario
  • Figure 3: Petri net for traffic lights constraints. North (n), East (e), South (s), and West (w) denote the lane directions.
  • Figure 4: 2 training executions for each DQN and PN-CDQN agent. Best runs, based on simulation timestep are chosen from all 256 training runs.
  • Figure 5: Evaluation results of reached simulation timesteps for intermediate trained agents. Each data point represents a training step increase of 100,000 steps. The two highest-scoring agents of both DQN and PN-CDQN are shown.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2