The Pump Scheduling Problem: A Real-World Scenario for Reinforcement Learning
Henrique Donâncio, Laurent Vercouter, Harald Roclawski
TL;DR
The paper tackles the gap between synthetic RL benchmarks and real-world decision-making by introducing a real-world pump scheduling RL testbed for a water distribution network. It provides a validated hydraulic simulator, three years of one-minute operational data, and a baseline RL task formulation (a POMDP approximated as an MDP) to enable end-to-end policy evaluation. Offline RL benchmarks (BCQ, DDQN, Maxmin Q-learning, REM) show that policies learned solely from demonstrations can achieve energy savings on the order of a few percent while meeting safety constraints, competitive with or exceeding human performance in some cases. The testbed opens avenues for research in representation learning, inverse RL, multi-objective RL, and safe exploration, and can be extended to continuous control to study smoother pump actuation and more dynamic system behavior.
Abstract
Deep Reinforcement Learning (DRL) has demonstrated impressive results in domains such as games and robotics, where task formulations are well-defined. However, few DRL benchmarks are grounded in complex, real-world environments, where safety constraints, partial observability, and the need for hand-engineered task representations pose significant challenges. To help bridge this gap, we introduce a testbed based on the pump scheduling problem in a real-world water distribution facility. The task involves controlling pumps to ensure a reliable water supply while minimizing energy consumption and respecting the constraints of the system. Our testbed includes a realistic simulator, three years of high-resolution (1-minute) operational data from human-led control, and a baseline RL task formulation. This testbed supports a wide range of research directions, including offline RL, safe exploration, inverse RL, and multi-objective optimization.
