Table of Contents
Fetching ...

Hierarchical RL-MPC for Demand Response Scheduling

Maximilian Bloor, Ehecatl Antonio Del Rio Chanona, Calvin Tsay

TL;DR

The paper tackles demand response scheduling for air separation units under volatile electricity prices by proposing a hierarchical RL-LMPC framework that pairs reinforcement learning with a lower-level linear model predictive controller. It compares a direct RL approach with a control-informed architecture where the RL agent provides setpoints to an LMPC, finding that the RL-LMPC scheme improves sample efficiency and constraint satisfaction while maintaining competitive economic performance. The ASU case study demonstrates load-shifting behavior and better handling of operational constraints due to the LMPC’s explicit constraint management. Overall, the work advances a practical hybrid control strategy that blends data-driven decision-making with traditional control to enable flexible operation in process industries.

Abstract

This paper presents a hierarchical framework for demand response optimization in air separation units (ASUs) that combines reinforcement learning (RL) with linear model predictive control (LMPC). We investigate two control architectures: a direct RL approach and a control-informed methodology where an RL agent provides setpoints to a lower-level LMPC. The proposed RL-LMPC framework demonstrates improved sample efficiency during training and better constraint satisfaction compared to direct RL control. Using an industrial ASU case study, we show that our approach successfully manages operational constraints while optimizing electricity costs under time-varying pricing. Results indicate that the RL-LMPC architecture achieves comparable economic performance to direct RL while providing better robustness and requiring fewer training samples to converge. The framework offers a practical solution for implementing flexible operation strategies in process industries, bridging the gap between data-driven methods and traditional control approaches.

Hierarchical RL-MPC for Demand Response Scheduling

TL;DR

The paper tackles demand response scheduling for air separation units under volatile electricity prices by proposing a hierarchical RL-LMPC framework that pairs reinforcement learning with a lower-level linear model predictive controller. It compares a direct RL approach with a control-informed architecture where the RL agent provides setpoints to an LMPC, finding that the RL-LMPC scheme improves sample efficiency and constraint satisfaction while maintaining competitive economic performance. The ASU case study demonstrates load-shifting behavior and better handling of operational constraints due to the LMPC’s explicit constraint management. Overall, the work advances a practical hybrid control strategy that blends data-driven decision-making with traditional control to enable flexible operation in process industries.

Abstract

This paper presents a hierarchical framework for demand response optimization in air separation units (ASUs) that combines reinforcement learning (RL) with linear model predictive control (LMPC). We investigate two control architectures: a direct RL approach and a control-informed methodology where an RL agent provides setpoints to a lower-level LMPC. The proposed RL-LMPC framework demonstrates improved sample efficiency during training and better constraint satisfaction compared to direct RL control. Using an industrial ASU case study, we show that our approach successfully manages operational constraints while optimizing electricity costs under time-varying pricing. Results indicate that the RL-LMPC architecture achieves comparable economic performance to direct RL while providing better robustness and requiring fewer training samples to converge. The framework offers a practical solution for implementing flexible operation strategies in process industries, bridging the gap between data-driven methods and traditional control approaches.

Paper Structure

This paper contains 13 sections, 15 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: RL-based control system diagrams: (a) with LMPC and (b) without LMPC
  • Figure 2: ASU Process Flowsheet. Manipulated Variables are marked in blue and the Product Storage Section is shaded in Red
  • Figure 3: Learning curve for both the RL-LMPC and Direct agents with rolling mean and variance, truncated when the maximum reward is reached within the 10,000 timestep budget
  • Figure 4: Top: power demand and price profiles. Bottom: production rate for the RL-LMPC (blue, solid lines) and direct (Orange, dash-dot lines) agents.
  • Figure 5: Storage, Product Impurity, and IRC Temperature Difference or the RL-LMPC (blue, solid lines) and direct (Orange, dash-dotted lines) agents with the imposed constraints (gray, dashed lines)