End-to-End Differentiable Predictive Control with Guaranteed Constraint Satisfaction and feasibility for Building Demand Response

Kaipeng Xu; Zhuo Zhi; Ruixuan Zhao; Keyue Jiang

End-to-End Differentiable Predictive Control with Guaranteed Constraint Satisfaction and feasibility for Building Demand Response

Kaipeng Xu, Zhuo Zhi, Ruixuan Zhao, Keyue Jiang

Abstract

The high energy consumption of buildings presents a critical need for advanced control strategies like Demand Response (DR). Differentiable Predictive Control (DPC) has emerged as a promising method for learning explicit control policies, yet conventional DPC frameworks are hindered by three key limitations: the use of simplistic dynamics models with limited expressiveness, a decoupled training paradigm that fails to optimize for closed-loop performance, and a lack of practical safety guarantees under realistic assumptions. To address these shortcomings, this paper proposes a novel End-to-End Differentiable Predictive Control (E2E-DPC) framework. Our approach utilizes an Encoder-Only Transformer to model the complex system dynamics and employs a unified, performance-oriented loss to jointly train the model and the control policy. Crucially, we introduce an online tube-based constraint tightening method that provides theoretical guarantees for recursive feasibility and constraint satisfaction without requiring complex offline computation of terminal sets. The framework is validated in a high-fidelity EnergyPlus simulation, controlling a multi-zone building for a DR task. The results demonstrate that the proposed method with guarantees achieves near-perfect constraint satisfaction - a reduction of over 99% in violations compared to the baseline - at the cost of only a minor increase in electricity expenditure. This work provides a deployable, performance-driven control solution for building energy management and establishes a new pathway for developing verifiable learning-based control systems under milder assumptions.

End-to-End Differentiable Predictive Control with Guaranteed Constraint Satisfaction and feasibility for Building Demand Response

Abstract

Paper Structure (32 sections, 3 theorems, 28 equations, 4 figures, 3 tables)

This paper contains 32 sections, 3 theorems, 28 equations, 4 figures, 3 tables.

Introduction
Related Work
Differentiable Predictive Control
Performance-Oriented and End-to-End Training
Approaches to Robust Constraint Satisfaction in MPC
The End-to-End Differentiable Predictive Control Framework
General Problem Formulation
Differentiable Closed-Loop System
System Dynamics Model ($f_x$)
Control Policy ($\pi_u$)
Performance-Oriented Loss Design
End-to-End Training Procedure
Guarantee Mechanism for Robust Constraint Satisfaction
Tube-Based Constraint Tightening
Probabilistic Certification of the Learned Policy
...and 17 more sections

Key Result

Theorem 1

Let Assumption assum:bounded_disturbances hold. The design choices for the tube parameters $(P, K, \rho, \varepsilon_k)$, selected according to Eqs. eq:dare_p-eq:eps_seq_impl, satisfy the conditions of incremental stabilizability as defined in Assumption 1 of kohler2018novel.

Figures (4)

Figure 1: The two-stage E2E-DPC training procedure, where green arrows denote the forward propagation for loss calculation and red arrows represent the backward propagation of gradients for parameter updates. The initial phase trains only the dynamics model $f_x$ against ground-truth data. The joint training phase unrolls the closed-loop system and backpropagates the performance-oriented E2E loss to update both $f_x$ and the policy $\pi_u$.
Figure 2: The co-simulation framework for online deployment of the DPC controller. The explicit control policy ($\pi_u$) receives real-time Building States (red line) from the EnergyPlus simulation and External Inputs (blue lines) such as electricity prices. It then computes and sends the optimal Control Actions (green line) back to the system. The dynamics model ($f_x$) is used offline for training but is not in the real-time control loop; its predictions (dotted line) are for analysis purposes only.
Figure 3: Comparison of indoor zone temperatures under the three control strategies over a 3-day period. The subfigures are arranged vertically to maximize clarity. The shaded area represents the comfort band ($[19, 24]\,^{\circ}\text{C}$). (a) DPC-C shows moderate violations. (b) E2E-DPC exhibits frequent and severe violations. (c) E2E-DPC-G successfully maintains all temperatures within the comfort band.
Figure 4: Total electricity consumption (Fa_E_All) and one-step-ahead predictions versus the TOU electricity price for the three controllers. The vertical arrangement allows for a detailed view of each controller's load-shifting strategy.

Theorems & Definitions (4)

Theorem 1: Satisfaction of Incremental Stabilizability Conditions
Corollary 1: Recursive Feasibility and Constraint Satisfaction
Theorem 2: Probabilistic Feasibility Guarantee
Remark 1: Deterministic and Probabilistic Guarantees

End-to-End Differentiable Predictive Control with Guaranteed Constraint Satisfaction and feasibility for Building Demand Response

Abstract

End-to-End Differentiable Predictive Control with Guaranteed Constraint Satisfaction and feasibility for Building Demand Response

Authors

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (4)