Table of Contents
Fetching ...

Input Convex Encoder-Only Transformer for Fast and Gradient-Stable MPC in Building Demand Response

Kaipeng Xu, Zhuo Zhi, Keyue Jiang

Abstract

Learning-based Model Predictive Control (MPC) has emerged as a powerful strategy for building demand response. However, its practical deployment is often hindered by the non-convex optimization problems induced by standard neural network models. These problems lead to long solver times and suboptimal solutions, making real-time control over long horizons challenging. While Input Convex Neural Networks (ICNNs), such as Input-Convex Long Short-Term Memorys (IC-LSTMs), are developed to address the convexity issue, their recurrent architectures suffer from high computational cost and gradient instability as the prediction horizon increases. To overcome these limitations, this paper introduces the Input-Convex Encoder-only Transformer (IC-EoT), a novel architecture that synergizes the parallel processing capabilities of the Transformer with the guaranteed tractability of input convexity. The IC-EoT was developed and evaluated in a high-fidelity co-simulation framework using the Energym Python library to interface with the EnergyPlus building simulator, and compared against its recurrent convex counterpart (IC-LSTM) and standard non-convex models. The results demonstrate that the IC-EoT is structurally immune to the gradient instability that affects recurrent ICNNs while maintaining comparable predictive accuracy. More critically, it substantially reduces MPC solver times; this speed advantage grows with the prediction horizon, with the IC-EoT proving 2.7 to 8.3 times faster than the IC-LSTM across horizons spanning from one to eight hours. This leap in computational efficiency makes the IC-EoT a robust and practical solution, enabling effective, real-time MPC for building energy management under realistic horizon decision-making scenarios.

Input Convex Encoder-Only Transformer for Fast and Gradient-Stable MPC in Building Demand Response

Abstract

Learning-based Model Predictive Control (MPC) has emerged as a powerful strategy for building demand response. However, its practical deployment is often hindered by the non-convex optimization problems induced by standard neural network models. These problems lead to long solver times and suboptimal solutions, making real-time control over long horizons challenging. While Input Convex Neural Networks (ICNNs), such as Input-Convex Long Short-Term Memorys (IC-LSTMs), are developed to address the convexity issue, their recurrent architectures suffer from high computational cost and gradient instability as the prediction horizon increases. To overcome these limitations, this paper introduces the Input-Convex Encoder-only Transformer (IC-EoT), a novel architecture that synergizes the parallel processing capabilities of the Transformer with the guaranteed tractability of input convexity. The IC-EoT was developed and evaluated in a high-fidelity co-simulation framework using the Energym Python library to interface with the EnergyPlus building simulator, and compared against its recurrent convex counterpart (IC-LSTM) and standard non-convex models. The results demonstrate that the IC-EoT is structurally immune to the gradient instability that affects recurrent ICNNs while maintaining comparable predictive accuracy. More critically, it substantially reduces MPC solver times; this speed advantage grows with the prediction horizon, with the IC-EoT proving 2.7 to 8.3 times faster than the IC-LSTM across horizons spanning from one to eight hours. This leap in computational efficiency makes the IC-EoT a robust and practical solution, enabling effective, real-time MPC for building energy management under realistic horizon decision-making scenarios.
Paper Structure (33 sections, 11 equations, 11 figures, 7 tables)

This paper contains 33 sections, 11 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: The overall architecture of the IC-EoT. The input sequence is processed through a stack of N identical blocks, each containing convex-by-design attention and feed-forward layers.
  • Figure 2: The detailed architecture of the proposed Convex Multi-Head Attention layer. A shared non-negative linear projection is scaled by non-negative diagonal matrices to form Q, K, and V. These are processed by parallel Convex Dot-Product Attention heads, which utilize the novel Convex-r-Softmax operator.
  • Figure 3: Fitting result for function $f_1(x,y)$. The left panel shows the comparison between the true surface and the IC-EoT prediction. The right panel shows the same comparison for IC-LSTM.
  • Figure 4: Fitting result for function $f_2(x,y)$.
  • Figure 5: Fitting result for function $f_3(x,y)$.
  • ...and 6 more figures