Input Convex Encoder-Only Transformer for Fast and Gradient-Stable MPC in Building Demand Response

Kaipeng Xu; Zhuo Zhi; Keyue Jiang

Input Convex Encoder-Only Transformer for Fast and Gradient-Stable MPC in Building Demand Response

Kaipeng Xu, Zhuo Zhi, Keyue Jiang

Abstract

Learning-based Model Predictive Control (MPC) has emerged as a powerful strategy for building demand response. However, its practical deployment is often hindered by the non-convex optimization problems induced by standard neural network models. These problems lead to long solver times and suboptimal solutions, making real-time control over long horizons challenging. While Input Convex Neural Networks (ICNNs), such as Input-Convex Long Short-Term Memorys (IC-LSTMs), are developed to address the convexity issue, their recurrent architectures suffer from high computational cost and gradient instability as the prediction horizon increases. To overcome these limitations, this paper introduces the Input-Convex Encoder-only Transformer (IC-EoT), a novel architecture that synergizes the parallel processing capabilities of the Transformer with the guaranteed tractability of input convexity. The IC-EoT was developed and evaluated in a high-fidelity co-simulation framework using the Energym Python library to interface with the EnergyPlus building simulator, and compared against its recurrent convex counterpart (IC-LSTM) and standard non-convex models. The results demonstrate that the IC-EoT is structurally immune to the gradient instability that affects recurrent ICNNs while maintaining comparable predictive accuracy. More critically, it substantially reduces MPC solver times; this speed advantage grows with the prediction horizon, with the IC-EoT proving 2.7 to 8.3 times faster than the IC-LSTM across horizons spanning from one to eight hours. This leap in computational efficiency makes the IC-EoT a robust and practical solution, enabling effective, real-time MPC for building energy management under realistic horizon decision-making scenarios.

Input Convex Encoder-Only Transformer for Fast and Gradient-Stable MPC in Building Demand Response

Abstract

Paper Structure (33 sections, 11 equations, 11 figures, 7 tables)

This paper contains 33 sections, 11 equations, 11 figures, 7 tables.

Introduction
Related Work
The Transformer Architecture
Progression of Input-Convex Neural Networks
Input-Convex Feed-Forward Networks
Input-Convex Recurrent Neural Networks
Input-Convex Long Short-Term Memory
Identifying the Research Gap
Input Convex Encoder-only Transformer
Architectural Choice: Encoder-Only Transformer
Architecture of the Input-Convex Encoder-Only Transformer
Convexity of the Input-Convex Encoder-Only Transformer
Toy Examples: Surface Fitting
Case Study
Simulation Framework
...and 18 more sections

Figures (11)

Figure 1: The overall architecture of the IC-EoT. The input sequence is processed through a stack of N identical blocks, each containing convex-by-design attention and feed-forward layers.
Figure 2: The detailed architecture of the proposed Convex Multi-Head Attention layer. A shared non-negative linear projection is scaled by non-negative diagonal matrices to form Q, K, and V. These are processed by parallel Convex Dot-Product Attention heads, which utilize the novel Convex-r-Softmax operator.
Figure 3: Fitting result for function $f_1(x,y)$. The left panel shows the comparison between the true surface and the IC-EoT prediction. The right panel shows the same comparison for IC-LSTM.
Figure 4: Fitting result for function $f_2(x,y)$.
Figure 5: Fitting result for function $f_3(x,y)$.
...and 6 more figures

Input Convex Encoder-Only Transformer for Fast and Gradient-Stable MPC in Building Demand Response

Abstract

Input Convex Encoder-Only Transformer for Fast and Gradient-Stable MPC in Building Demand Response

Authors

Abstract

Table of Contents

Figures (11)