Maintenance Strategies for Sewer Pipes with Multi-State Degradation and Deep Reinforcement Learning
Lisandro A. Jimenez-Roa, Thiago D. Simão, Zaharah Bukhsh, Tiedo Tinga, Hajo Molegraaf, Nils Jansen, Marielle Stoelinga
TL;DR
This work addresses maintenance policy optimization for sewer pipes under multi-state degradation by integrating Multi-State Degradation Models (MSDM) with Deep Reinforcement Learning (DRL). The authors formulate a prognostics-informed Markov Decision Process and train PPO-based agents in MSDM-driven environments, evaluating against traditional heuristics through a Dutch case study ( Breda, 25k+ pipes). Key findings show that DRL policies, particularly those trained with Gompertz-based MSDMs, adapt to pipe age and surpass condition-based, scheduled, and reactive maintenance in cost efficiency, while maintaining lower degradation levels. The study demonstrates the practical potential of DRL in PHM for long-lived civil infrastructure and highlights the value of incorporating prognostic outputs into the RL state for improved decision-making. Future work envisions partial observability, system-level expansion, and broader algorithmic comparisons to further enhance robust, explainable maintenance strategies.
Abstract
Large-scale infrastructure systems are crucial for societal welfare, and their effective management requires strategic forecasting and intervention methods that account for various complexities. Our study addresses two challenges within the Prognostics and Health Management (PHM) framework applied to sewer assets: modeling pipe degradation across severity levels and developing effective maintenance policies. We employ Multi-State Degradation Models (MSDM) to represent the stochastic degradation process in sewer pipes and use Deep Reinforcement Learning (DRL) to devise maintenance strategies. A case study of a Dutch sewer network exemplifies our methodology. Our findings demonstrate the model's effectiveness in generating intelligent, cost-saving maintenance strategies that surpass heuristics. It adapts its management strategy based on the pipe's age, opting for a passive approach for newer pipes and transitioning to active strategies for older ones to prevent failures and reduce costs. This research highlights DRL's potential in optimizing maintenance policies. Future research will aim improve the model by incorporating partial observability, exploring various reinforcement learning algorithms, and extending this methodology to comprehensive infrastructure management.
