Application of Soft Actor-Critic Algorithms in Optimizing Wastewater Treatment with Time Delays Integration
Esmaeel Mohammadi, Daniel Ortiz-Arroyo, Aviaja Anna Hansen, Mikkel Stokholm-Bjerregaard, Sebastien Gros, Akhil S Anand, Petar Durdevic
TL;DR
The paper addresses optimizing phosphorus removal in wastewater treatment under time-delayed feedback by applying Soft Actor-Critic (SAC) within a delay-aware, LSTM-based simulator. It introduces three delay scenarios (no, constant, random) and demonstrates that delay-aware SAC, particularly under random delays, achieves higher rewards, lower target deviations, reduced emissions, and lower costs compared with a traditional PID controller. The methodology combines a high-fidelity OWTP simulator, openAI Gym interfaces, and multi-environment SAC training to capture stochastic delays and multi-step state predictions. The findings support the practical viability of DRL-based control for industrial processes, offering robustness to delays and improved environmental and economic performance. The work suggests future directions in multi-objective optimization and hybrid control approaches, advancing sustainable and adaptive wastewater treatment strategies.
Abstract
Wastewater treatment plants face unique challenges for process control due to their complex dynamics, slow time constants, and stochastic delays in observations and actions. These characteristics make conventional control methods, such as Proportional-Integral-Derivative controllers, suboptimal for achieving efficient phosphorus removal, a critical component of wastewater treatment to ensure environmental sustainability. This study addresses these challenges using a novel deep reinforcement learning approach based on the Soft Actor-Critic algorithm, integrated with a custom simulator designed to model the delayed feedback inherent in wastewater treatment plants. The simulator incorporates Long Short-Term Memory networks for accurate multi-step state predictions, enabling realistic training scenarios. To account for the stochastic nature of delays, agents were trained under three delay scenarios: no delay, constant delay, and random delay. The results demonstrate that incorporating random delays into the reinforcement learning framework significantly improves phosphorus removal efficiency while reducing operational costs. Specifically, the delay-aware agent achieved 36% reduction in phosphorus emissions, 55% higher reward, 77% lower target deviation from the regulatory limit, and 9% lower total costs than traditional control methods in the simulated environment. These findings underscore the potential of reinforcement learning to overcome the limitations of conventional control strategies in wastewater treatment, providing an adaptive and cost-effective solution for phosphorus removal.
