Table of Contents
Fetching ...

End-to-End Differentiable Predictive Control with Guaranteed Constraint Satisfaction and feasibility for Building Demand Response

Kaipeng Xu, Zhuo Zhi, Ruixuan Zhao, Keyue Jiang

Abstract

The high energy consumption of buildings presents a critical need for advanced control strategies like Demand Response (DR). Differentiable Predictive Control (DPC) has emerged as a promising method for learning explicit control policies, yet conventional DPC frameworks are hindered by three key limitations: the use of simplistic dynamics models with limited expressiveness, a decoupled training paradigm that fails to optimize for closed-loop performance, and a lack of practical safety guarantees under realistic assumptions. To address these shortcomings, this paper proposes a novel End-to-End Differentiable Predictive Control (E2E-DPC) framework. Our approach utilizes an Encoder-Only Transformer to model the complex system dynamics and employs a unified, performance-oriented loss to jointly train the model and the control policy. Crucially, we introduce an online tube-based constraint tightening method that provides theoretical guarantees for recursive feasibility and constraint satisfaction without requiring complex offline computation of terminal sets. The framework is validated in a high-fidelity EnergyPlus simulation, controlling a multi-zone building for a DR task. The results demonstrate that the proposed method with guarantees achieves near-perfect constraint satisfaction - a reduction of over 99% in violations compared to the baseline - at the cost of only a minor increase in electricity expenditure. This work provides a deployable, performance-driven control solution for building energy management and establishes a new pathway for developing verifiable learning-based control systems under milder assumptions.

End-to-End Differentiable Predictive Control with Guaranteed Constraint Satisfaction and feasibility for Building Demand Response

Abstract

The high energy consumption of buildings presents a critical need for advanced control strategies like Demand Response (DR). Differentiable Predictive Control (DPC) has emerged as a promising method for learning explicit control policies, yet conventional DPC frameworks are hindered by three key limitations: the use of simplistic dynamics models with limited expressiveness, a decoupled training paradigm that fails to optimize for closed-loop performance, and a lack of practical safety guarantees under realistic assumptions. To address these shortcomings, this paper proposes a novel End-to-End Differentiable Predictive Control (E2E-DPC) framework. Our approach utilizes an Encoder-Only Transformer to model the complex system dynamics and employs a unified, performance-oriented loss to jointly train the model and the control policy. Crucially, we introduce an online tube-based constraint tightening method that provides theoretical guarantees for recursive feasibility and constraint satisfaction without requiring complex offline computation of terminal sets. The framework is validated in a high-fidelity EnergyPlus simulation, controlling a multi-zone building for a DR task. The results demonstrate that the proposed method with guarantees achieves near-perfect constraint satisfaction - a reduction of over 99% in violations compared to the baseline - at the cost of only a minor increase in electricity expenditure. This work provides a deployable, performance-driven control solution for building energy management and establishes a new pathway for developing verifiable learning-based control systems under milder assumptions.
Paper Structure (32 sections, 3 theorems, 28 equations, 4 figures, 3 tables)

This paper contains 32 sections, 3 theorems, 28 equations, 4 figures, 3 tables.

Key Result

Theorem 1

Let Assumption assum:bounded_disturbances hold. The design choices for the tube parameters $(P, K, \rho, \varepsilon_k)$, selected according to Eqs. eq:dare_p-eq:eps_seq_impl, satisfy the conditions of incremental stabilizability as defined in Assumption 1 of kohler2018novel.

Figures (4)

  • Figure 1: The two-stage E2E-DPC training procedure, where green arrows denote the forward propagation for loss calculation and red arrows represent the backward propagation of gradients for parameter updates. The initial phase trains only the dynamics model $f_x$ against ground-truth data. The joint training phase unrolls the closed-loop system and backpropagates the performance-oriented E2E loss to update both $f_x$ and the policy $\pi_u$.
  • Figure 2: The co-simulation framework for online deployment of the DPC controller. The explicit control policy ($\pi_u$) receives real-time Building States (red line) from the EnergyPlus simulation and External Inputs (blue lines) such as electricity prices. It then computes and sends the optimal Control Actions (green line) back to the system. The dynamics model ($f_x$) is used offline for training but is not in the real-time control loop; its predictions (dotted line) are for analysis purposes only.
  • Figure 3: Comparison of indoor zone temperatures under the three control strategies over a 3-day period. The subfigures are arranged vertically to maximize clarity. The shaded area represents the comfort band ($[19, 24]\,^{\circ}\text{C}$). (a) DPC-C shows moderate violations. (b) E2E-DPC exhibits frequent and severe violations. (c) E2E-DPC-G successfully maintains all temperatures within the comfort band.
  • Figure 4: Total electricity consumption (Fa_E_All) and one-step-ahead predictions versus the TOU electricity price for the three controllers. The vertical arrangement allows for a detailed view of each controller's load-shifting strategy.

Theorems & Definitions (4)

  • Theorem 1: Satisfaction of Incremental Stabilizability Conditions
  • Corollary 1: Recursive Feasibility and Constraint Satisfaction
  • Theorem 2: Probabilistic Feasibility Guarantee
  • Remark 1: Deterministic and Probabilistic Guarantees