Table of Contents
Fetching ...

Hierarchical Control Framework Integrating LLMs with RL for Decarbonized HVAC Operation

Dianyu Zhong, Tian Xing, Kailai Sun, Xu Yang, Heye Huang, Irfan Qaisar, Tinggang Jia, Shaobo Wang, Qianchuan Zhao

Abstract

Heating, ventilation, and air conditioning (HVAC) systems account for a substantial share of building energy consumption. Environmental uncertainty and dynamic occupancy behavior bring challenges in decarbonized HVAC control. Reinforcement learning (RL) can optimize long-horizon comfort-energy trade-offs but suffers from exponential action-space growth and inefficient exploration in multi-zone buildings. Large language models (LLMs) can encode semantic context and operational knowledge, yet when used alone they lack reliable closed-loop numerical optimization and may result in less reliable comfort-energy trade-offs. To address these limitations, we propose a hierarchical control framework in which a fine-tuned LLM, trained on historical building operation data, generates state-dependent feasible action masks that prune the combinatorial joint action space into operationally plausible subsets. A masked value-based RL agent then performs constrained optimization within this reduced space, improving exploration efficiency and training stability. Evaluated in a high-fidelity simulator calibrated with real-world sensor and occupancy data from a 7-zone office building, the proposed method achieves a mean PPD of 7.30%, corresponding to reductions of 39.1% relative to DQN, the best vanilla RL baseline in comfort, and 53.1% relative to the best vanilla LLM baseline, while reducing daily HVAC energy use to 140.90~kWh, lower than all vanilla RL baselines. The results suggest that LLM-guided action masking is a promising pathway toward efficient multi-zone HVAC control.

Hierarchical Control Framework Integrating LLMs with RL for Decarbonized HVAC Operation

Abstract

Heating, ventilation, and air conditioning (HVAC) systems account for a substantial share of building energy consumption. Environmental uncertainty and dynamic occupancy behavior bring challenges in decarbonized HVAC control. Reinforcement learning (RL) can optimize long-horizon comfort-energy trade-offs but suffers from exponential action-space growth and inefficient exploration in multi-zone buildings. Large language models (LLMs) can encode semantic context and operational knowledge, yet when used alone they lack reliable closed-loop numerical optimization and may result in less reliable comfort-energy trade-offs. To address these limitations, we propose a hierarchical control framework in which a fine-tuned LLM, trained on historical building operation data, generates state-dependent feasible action masks that prune the combinatorial joint action space into operationally plausible subsets. A masked value-based RL agent then performs constrained optimization within this reduced space, improving exploration efficiency and training stability. Evaluated in a high-fidelity simulator calibrated with real-world sensor and occupancy data from a 7-zone office building, the proposed method achieves a mean PPD of 7.30%, corresponding to reductions of 39.1% relative to DQN, the best vanilla RL baseline in comfort, and 53.1% relative to the best vanilla LLM baseline, while reducing daily HVAC energy use to 140.90~kWh, lower than all vanilla RL baselines. The results suggest that LLM-guided action masking is a promising pathway toward efficient multi-zone HVAC control.

Paper Structure

This paper contains 37 sections, 37 equations, 13 figures, 5 tables, 1 algorithm.

Figures (13)

  • Figure 1: Overview of the proposed hierarchical LLM--RL framework for multi-zone HVAC control. Historical operation data are used to construct feasible-action labels and supervise the LLM, while the calibrated simulator provides the training and evaluation environment for the RL agent. During online control, the fine-tuned LLM generates a state-dependent feasible action mask, and the masked DQN selects the final joint FCU action within the reduced action space.
  • Figure 2: System instruction component of the prompt. This part specifies the control objective, operational context, and domain knowledge on HVAC actuation and spatial topology.
  • Figure 3: Formatting and state-serialization component of the prompt. The prompt enforces a JSON output schema and represents recent observations over a short temporal window, enabling the LLM to produce structured feasible fan-speed recommendations.
  • Figure 4: Layout of the office building ZHONG2025116219.
  • Figure 5: HVAC system configuration. (a) FCUs of zones 5 to 7. (b) Sensors of FCU 1. (c) Video sampling of zone 7. (d) Sensors of water pumps. (e) Electrical cabinet. (f) Temperature sensor. (g) Video camera of zone 2 ZHONG2025116219.
  • ...and 8 more figures