Table of Contents
Fetching ...

Application of Soft Actor-Critic Algorithms in Optimizing Wastewater Treatment with Time Delays Integration

Esmaeel Mohammadi, Daniel Ortiz-Arroyo, Aviaja Anna Hansen, Mikkel Stokholm-Bjerregaard, Sebastien Gros, Akhil S Anand, Petar Durdevic

TL;DR

The paper addresses optimizing phosphorus removal in wastewater treatment under time-delayed feedback by applying Soft Actor-Critic (SAC) within a delay-aware, LSTM-based simulator. It introduces three delay scenarios (no, constant, random) and demonstrates that delay-aware SAC, particularly under random delays, achieves higher rewards, lower target deviations, reduced emissions, and lower costs compared with a traditional PID controller. The methodology combines a high-fidelity OWTP simulator, openAI Gym interfaces, and multi-environment SAC training to capture stochastic delays and multi-step state predictions. The findings support the practical viability of DRL-based control for industrial processes, offering robustness to delays and improved environmental and economic performance. The work suggests future directions in multi-objective optimization and hybrid control approaches, advancing sustainable and adaptive wastewater treatment strategies.

Abstract

Wastewater treatment plants face unique challenges for process control due to their complex dynamics, slow time constants, and stochastic delays in observations and actions. These characteristics make conventional control methods, such as Proportional-Integral-Derivative controllers, suboptimal for achieving efficient phosphorus removal, a critical component of wastewater treatment to ensure environmental sustainability. This study addresses these challenges using a novel deep reinforcement learning approach based on the Soft Actor-Critic algorithm, integrated with a custom simulator designed to model the delayed feedback inherent in wastewater treatment plants. The simulator incorporates Long Short-Term Memory networks for accurate multi-step state predictions, enabling realistic training scenarios. To account for the stochastic nature of delays, agents were trained under three delay scenarios: no delay, constant delay, and random delay. The results demonstrate that incorporating random delays into the reinforcement learning framework significantly improves phosphorus removal efficiency while reducing operational costs. Specifically, the delay-aware agent achieved 36% reduction in phosphorus emissions, 55% higher reward, 77% lower target deviation from the regulatory limit, and 9% lower total costs than traditional control methods in the simulated environment. These findings underscore the potential of reinforcement learning to overcome the limitations of conventional control strategies in wastewater treatment, providing an adaptive and cost-effective solution for phosphorus removal.

Application of Soft Actor-Critic Algorithms in Optimizing Wastewater Treatment with Time Delays Integration

TL;DR

The paper addresses optimizing phosphorus removal in wastewater treatment under time-delayed feedback by applying Soft Actor-Critic (SAC) within a delay-aware, LSTM-based simulator. It introduces three delay scenarios (no, constant, random) and demonstrates that delay-aware SAC, particularly under random delays, achieves higher rewards, lower target deviations, reduced emissions, and lower costs compared with a traditional PID controller. The methodology combines a high-fidelity OWTP simulator, openAI Gym interfaces, and multi-environment SAC training to capture stochastic delays and multi-step state predictions. The findings support the practical viability of DRL-based control for industrial processes, offering robustness to delays and improved environmental and economic performance. The work suggests future directions in multi-objective optimization and hybrid control approaches, advancing sustainable and adaptive wastewater treatment strategies.

Abstract

Wastewater treatment plants face unique challenges for process control due to their complex dynamics, slow time constants, and stochastic delays in observations and actions. These characteristics make conventional control methods, such as Proportional-Integral-Derivative controllers, suboptimal for achieving efficient phosphorus removal, a critical component of wastewater treatment to ensure environmental sustainability. This study addresses these challenges using a novel deep reinforcement learning approach based on the Soft Actor-Critic algorithm, integrated with a custom simulator designed to model the delayed feedback inherent in wastewater treatment plants. The simulator incorporates Long Short-Term Memory networks for accurate multi-step state predictions, enabling realistic training scenarios. To account for the stochastic nature of delays, agents were trained under three delay scenarios: no delay, constant delay, and random delay. The results demonstrate that incorporating random delays into the reinforcement learning framework significantly improves phosphorus removal efficiency while reducing operational costs. Specifically, the delay-aware agent achieved 36% reduction in phosphorus emissions, 55% higher reward, 77% lower target deviation from the regulatory limit, and 9% lower total costs than traditional control methods in the simulated environment. These findings underscore the potential of reinforcement learning to overcome the limitations of conventional control strategies in wastewater treatment, providing an adaptive and cost-effective solution for phosphorus removal.

Paper Structure

This paper contains 33 sections, 17 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: A system with three different actions, with random action and observation delays.
  • Figure 2: The process of training Soft Actor-Critic policy on the simulation environment
  • Figure 3: Average cumulative rewards per episode for SAC with different delay and Multi-experiment with non-linear reward function.
  • Figure 4: The inputs and outputs of the phosphorus removal process in WWTP. Where $\mathbf{u}_t$, $\mathbf{x}_{e,t}$, and $\mathbf{y}_t$ represent the control variables, exogenous variables, and target variables.
  • Figure 5: The comparison of the existing PID control and learned SAC policies for a point of the wastewater treatment dataset on September 15th, 2022.