Table of Contents
Fetching ...

Experimental evaluation of offline reinforcement learning for HVAC control in buildings

Jun Wang, Linyan Li, Qi Liu, Yu Yang

TL;DR

This work addresses the challenge of deploying offline reinforcement learning for building HVAC control by systematically evaluating state-of-the-art offline RL algorithms on two simulated environments and introducing an observation-history module to handle partial observability. It demonstrates that offline RL methods, particularly CQL, can learn effective HVAC policies from suboptimal offline data, achieving up to 28.5% reduction in indoor temperature violations and up to 12.1% energy savings relative to a baseline, while observation-history representations improve stability and performance. Key contributions include a practical end-to-end offline framework, a dataset-quality/quantity analysis using the regret ratio $\delta_{\tau}$, and insights into how history length and data diversity affect learning. The findings indicate significant potential for data-efficient, model-free HVAC control that leverages existing historical data, with implications for real-world deployment and energy management in buildings.

Abstract

Reinforcement learning (RL) techniques have been increasingly investigated for dynamic HVAC control in buildings. However, most studies focus on exploring solutions in online or off-policy scenarios without discussing in detail the implementation feasibility or effectiveness of dealing with purely offline datasets or trajectories. The lack of these works limits the real-world deployment of RL-based HVAC controllers, especially considering the abundance of historical data. To this end, this paper comprehensively evaluates the strengths and limitations of state-of-the-art offline RL algorithms by conducting analytical and numerical studies. The analysis is conducted from two perspectives: algorithms and dataset characteristics. As a prerequisite, the necessity of applying offline RL algorithms is first confirmed in two building environments. The ability of observation history modeling to reduce violations and enhance performance is subsequently studied. Next, the performance of RL-based controllers under datasets with different qualitative and quantitative conditions is investigated, including constraint satisfaction and power consumption. Finally, the sensitivity of certain hyperparameters is also evaluated. The results indicate that datasets of a certain suboptimality level and relatively small scale can be utilized to effectively train a well-performed RL-based HVAC controller. Specifically, such controllers can reduce at most 28.5% violation ratios of indoor temperatures and achieve at most 12.1% power savings compared to the baseline controller. In summary, this paper presents our well-structured investigations and new findings when applying offline reinforcement learning to building HVAC systems.

Experimental evaluation of offline reinforcement learning for HVAC control in buildings

TL;DR

This work addresses the challenge of deploying offline reinforcement learning for building HVAC control by systematically evaluating state-of-the-art offline RL algorithms on two simulated environments and introducing an observation-history module to handle partial observability. It demonstrates that offline RL methods, particularly CQL, can learn effective HVAC policies from suboptimal offline data, achieving up to 28.5% reduction in indoor temperature violations and up to 12.1% energy savings relative to a baseline, while observation-history representations improve stability and performance. Key contributions include a practical end-to-end offline framework, a dataset-quality/quantity analysis using the regret ratio , and insights into how history length and data diversity affect learning. The findings indicate significant potential for data-efficient, model-free HVAC control that leverages existing historical data, with implications for real-world deployment and energy management in buildings.

Abstract

Reinforcement learning (RL) techniques have been increasingly investigated for dynamic HVAC control in buildings. However, most studies focus on exploring solutions in online or off-policy scenarios without discussing in detail the implementation feasibility or effectiveness of dealing with purely offline datasets or trajectories. The lack of these works limits the real-world deployment of RL-based HVAC controllers, especially considering the abundance of historical data. To this end, this paper comprehensively evaluates the strengths and limitations of state-of-the-art offline RL algorithms by conducting analytical and numerical studies. The analysis is conducted from two perspectives: algorithms and dataset characteristics. As a prerequisite, the necessity of applying offline RL algorithms is first confirmed in two building environments. The ability of observation history modeling to reduce violations and enhance performance is subsequently studied. Next, the performance of RL-based controllers under datasets with different qualitative and quantitative conditions is investigated, including constraint satisfaction and power consumption. Finally, the sensitivity of certain hyperparameters is also evaluated. The results indicate that datasets of a certain suboptimality level and relatively small scale can be utilized to effectively train a well-performed RL-based HVAC controller. Specifically, such controllers can reduce at most 28.5% violation ratios of indoor temperatures and achieve at most 12.1% power savings compared to the baseline controller. In summary, this paper presents our well-structured investigations and new findings when applying offline reinforcement learning to building HVAC systems.
Paper Structure (30 sections, 15 equations, 9 figures, 4 tables)

This paper contains 30 sections, 15 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Illustration of (a) the training flow of off-policy RL algorithms, (b) the training flow of offline RL algorithms and (c) the difference between model-free and model-based offline RL algorithms.
  • Figure 2: Illustration of modeling the observation history. Once a fully observable state $s_t$ becomes a partially observable observation $o_t$, considering the history sequence $h_t=o_{1:t}$ would be better.
  • Figure 3: Overview of the MixedUse facility.
  • Figure 4: Control flow of the 2-Zone DataCenter facility.
  • Figure 5: Comparisons of off-policy and offline RL algorithms in two different scenarios.
  • ...and 4 more figures