Table of Contents
Fetching ...

The Pump Scheduling Problem: A Real-World Scenario for Reinforcement Learning

Henrique Donâncio, Laurent Vercouter, Harald Roclawski

TL;DR

The paper tackles the gap between synthetic RL benchmarks and real-world decision-making by introducing a real-world pump scheduling RL testbed for a water distribution network. It provides a validated hydraulic simulator, three years of one-minute operational data, and a baseline RL task formulation (a POMDP approximated as an MDP) to enable end-to-end policy evaluation. Offline RL benchmarks (BCQ, DDQN, Maxmin Q-learning, REM) show that policies learned solely from demonstrations can achieve energy savings on the order of a few percent while meeting safety constraints, competitive with or exceeding human performance in some cases. The testbed opens avenues for research in representation learning, inverse RL, multi-objective RL, and safe exploration, and can be extended to continuous control to study smoother pump actuation and more dynamic system behavior.

Abstract

Deep Reinforcement Learning (DRL) has demonstrated impressive results in domains such as games and robotics, where task formulations are well-defined. However, few DRL benchmarks are grounded in complex, real-world environments, where safety constraints, partial observability, and the need for hand-engineered task representations pose significant challenges. To help bridge this gap, we introduce a testbed based on the pump scheduling problem in a real-world water distribution facility. The task involves controlling pumps to ensure a reliable water supply while minimizing energy consumption and respecting the constraints of the system. Our testbed includes a realistic simulator, three years of high-resolution (1-minute) operational data from human-led control, and a baseline RL task formulation. This testbed supports a wide range of research directions, including offline RL, safe exploration, inverse RL, and multi-objective optimization.

The Pump Scheduling Problem: A Real-World Scenario for Reinforcement Learning

TL;DR

The paper tackles the gap between synthetic RL benchmarks and real-world decision-making by introducing a real-world pump scheduling RL testbed for a water distribution network. It provides a validated hydraulic simulator, three years of one-minute operational data, and a baseline RL task formulation (a POMDP approximated as an MDP) to enable end-to-end policy evaluation. Offline RL benchmarks (BCQ, DDQN, Maxmin Q-learning, REM) show that policies learned solely from demonstrations can achieve energy savings on the order of a few percent while meeting safety constraints, competitive with or exceeding human performance in some cases. The testbed opens avenues for research in representation learning, inverse RL, multi-objective RL, and safe exploration, and can be extended to continuous control to study smoother pump actuation and more dynamic system behavior.

Abstract

Deep Reinforcement Learning (DRL) has demonstrated impressive results in domains such as games and robotics, where task formulations are well-defined. However, few DRL benchmarks are grounded in complex, real-world environments, where safety constraints, partial observability, and the need for hand-engineered task representations pose significant challenges. To help bridge this gap, we introduce a testbed based on the pump scheduling problem in a real-world water distribution facility. The task involves controlling pumps to ensure a reliable water supply while minimizing energy consumption and respecting the constraints of the system. Our testbed includes a realistic simulator, three years of high-resolution (1-minute) operational data from human-led control, and a baseline RL task formulation. This testbed supports a wide range of research directions, including offline RL, safe exploration, inverse RL, and multi-objective optimization.
Paper Structure (24 sections, 6 equations, 8 figures, 2 tables)

This paper contains 24 sections, 6 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Water distribution system overview. The system has four pumps with fixed speed (ON/OFF) and two elevated water storage tanks.
  • Figure 2: Overview of human-led water system operation. (a–c) Water consumption. (d–f) Tank level profiles. (g–i) Pump switching. (j–l) Electricity usage.
  • Figure 3: The dynamics of a POMDP and the RL learning process through the water distribution system simulator.
  • Figure 4: Benchmark results.
  • Figure 5: Tank levels (top row) and average daily pump switches (bottom row) for the best-performing policy of each offline RL algorithm: BCQ, DDQN, Maxmin Q-learning, and REM. All policies successfully maintain tank levels within safety limits while enforcing conservative switching patterns compared to human-led control.
  • ...and 3 more figures