Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management

M. Saifullah; K. G. Papakonstantinou; C. P. Andriotis; S. M. Stoffels

Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management

M. Saifullah, K. G. Papakonstantinou, C. P. Andriotis, S. M. Stoffels

TL;DR

The paper tackles lifecycle inspection and maintenance for large transportation networks under uncertainty by casting the problem as a constrained POMDP and solving it with a scalable multi-agent DRL approach. It introduces DDMAC-CTDE, a fully centralized-training, decentralized-execution framework that assigns one agent per component and uses a centralized critic to guide learning, enabling near-optimal cross-asset decisions. The authors demonstrate substantial cost savings over Condition-Based Maintenance (CBM) and VDOT baselines on a detailed Hampton Roads network, while meeting hard budget and soft performance constraints. This work advances practical, constraint-aware DRL for infrastructure management and provides a comprehensive modeling environment linking pavements and bridges through CCI and IRI indices, gamma-process deterioration, and network-wide risk metrics.

Abstract

We present a multi-agent Deep Reinforcement Learning (DRL) framework for managing large transportation infrastructure systems over their life-cycle. Life-cycle management of such engineering systems is a computationally intensive task, requiring appropriate sequential inspection and maintenance decisions able to reduce long-term risks and costs, while dealing with different uncertainties and constraints that lie in high-dimensional spaces. To date, static age- or condition-based maintenance methods and risk-based or periodic inspection plans have mostly addressed this class of optimization problems. However, optimality, scalability, and uncertainty limitations are often manifested under such approaches. The optimization problem in this work is cast in the framework of constrained Partially Observable Markov Decision Processes (POMDPs), which provides a comprehensive mathematical basis for stochastic sequential decision settings with observation uncertainties, risk considerations, and limited resources. To address significantly large state and action spaces, a Deep Decentralized Multi-agent Actor-Critic (DDMAC) DRL method with Centralized Training and Decentralized Execution (CTDE), termed as DDMAC-CTDE is developed. The performance strengths of the DDMAC-CTDE method are demonstrated in a generally representative and realistic example application of an existing transportation network in Virginia, USA. The network includes several bridge and pavement components with nonstationary degradation, agency-imposed constraints, and traffic delay and risk considerations. Compared to traditional management policies for transportation networks, the proposed DDMAC-CTDE method vastly outperforms its counterparts. Overall, the proposed algorithmic framework provides near optimal solutions for transportation infrastructure management under real-world constraints and complexities.

Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management

TL;DR

Abstract

Paper Structure (40 sections, 34 equations, 12 figures, 32 tables, 3 algorithms)

This paper contains 40 sections, 34 equations, 12 figures, 32 tables, 3 algorithms.

Introduction
Background
Partially Observable Markov Decision Processes
Deep Reinforcement Learning
DDMAC-CTDE
Formulation
Constrained Optimization
Objective function with constraints
Costs related to infrastructure management
DDMAC-CTDE formulation with constraints
Pavement Modeling
Critical Condition Index (CCI)
Fitting a nonstationary gamma process
Determining transition probabilities
Observation probabilities for CCI inspection action
...and 25 more sections

Figures (12)

Figure 1: Constrained Deep Decentralized Multi-agent Actor Critic (DDMAC) with Centralized Training Decentralized Execution (CTDE) architecture.
Figure 2: Modeled mean CCI for different levels of traffic.
Figure 3: (a) Fitted gamma model, (b) Scatter plot for CCI corresponding to traffic level A.
Figure 4: Transition probabilities for Traffic level A, with (a) starting state = 6, (b) starting state = 5, smoothed over time.
Figure 5: Transition probabilities in time, moving from state 9 (left) and 8 (right) to lower states.
...and 7 more figures

Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management

TL;DR

Abstract

Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management

Authors

TL;DR

Abstract

Table of Contents

Figures (12)