Employing Federated Learning for Training Autonomous HVAC Systems

Fredrik Hagström; Vikas Garg; Fabricio Oliveira

Employing Federated Learning for Training Autonomous HVAC Systems

Fredrik Hagström, Vikas Garg, Fabricio Oliveira

TL;DR

The paper tackles the challenge of efficiently training autonomous HVAC controllers that generalize across diverse buildings. It integrates Soft Actor-Critic with federated learning (FedAvg, FedOpt, gradient masking) to train a global policy from data spread across multiple climate zones, while preserving data privacy. Empirical results in a simulated data-center environment show that federated training yields faster learning, better generalization to unseen buildings, and more stable performance than locally trained agents or PID baselines, with Adam as the most effective client optimizer. The work highlights practical benefits for deployment, including privacy preservation and reduced data requirements, while acknowledging remaining gaps to real-world pilots and suggesting directions like learning-rate scheduling and transfer learning.

Abstract

Buildings account for 40% of global energy consumption. A considerable portion of building energy consumption stems from heating, ventilation, and air conditioning (HVAC), and thus implementing smart, energy-efficient HVAC systems has the potential to significantly impact the course of climate change. In recent years, model-free reinforcement learning algorithms have been increasingly assessed for this purpose due to their ability to learn and adapt purely from experience. They have been shown to outperform classical controllers in terms of energy cost and consumption, as well as thermal comfort. However, their weakness lies in their relatively poor data efficiency, requiring long periods of training to reach acceptable policies, making them inapplicable to real-world controllers directly. In this paper, we demonstrate that using federated learning to train the reinforcement learning controller of HVAC systems can improve the learning speed, as well as improve their ability to generalize, which in turn facilitates transfer learning to unseen building environments. In our setting, a global control policy is learned by aggregating local policies trained on multiple data centers located in different climate zones. The goal of the policy is to minimize energy consumption and maximize thermal comfort. We perform experiments evaluating three different optimizers for local policy training, as well as three different federated learning algorithms against two alternative baselines. Our experiments show that these effects lead to a faster learning speed, as well as greater generalization capabilities in the federated policy compared to any individually trained policy. Furthermore, the learning stability is significantly improved, with the learning process and performance of the federated policy being less sensitive to the choice of parameters and the inherent randomness of reinforcement learning.

Employing Federated Learning for Training Autonomous HVAC Systems

TL;DR

Abstract

Paper Structure (32 sections, 34 equations, 21 figures, 7 tables, 2 algorithms)

This paper contains 32 sections, 34 equations, 21 figures, 7 tables, 2 algorithms.

Introduction
Related Work
Early approaches
Value-based approaches
Policy-based approaches
Model-based approaches
Federated learning in the building domain
Methodology
Reinforcement Learning
Soft Actor-Critic
Federated learning
Federated Averaging
FedOpt
Gradient masking
Experiments
...and 17 more sections

Figures (21)

Figure 1: Graph of the zone thermal comfort reward $r_i$. The hyperparameters are set to $\lambda_g = 0.2$ and $\lambda_t = 0.1$, and the comfort range is bounded to $T_{min} = 18 \degree \textrm{C}$ and $T_{max} = 27 \degree \textrm{C}$. The target temperature is set to the midpoint of the comfort range, $T_{tgt} = 22.5 \degree \textrm{C}$.
Figure 2: Progression of the energy consumption and comfort violation on the Helsinki evaluation environment of the FedAvg agent with different client optimizers.
Figure 3: Progression of the energy consumption and comfort violation on the Helsinki evaluation environment of FedAvg and independent agents with Adam as client optimizer. In the comfort violation plot \ref{['eval:fedavg:ind:comfort']} we omit the outlier Antananarivo for the sake of legibility.
Figure 4: Progression of the energy consumption and comfort violation of FedAvg and independent agents on training environments Tokyo, AZ, CO and NY.
Figure 5: Progression of the energy consumption and comfort violation of FedAvg and independent agents on training environments Tokyo, AZ, CO and NY.
...and 16 more figures

Employing Federated Learning for Training Autonomous HVAC Systems

TL;DR

Abstract

Employing Federated Learning for Training Autonomous HVAC Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (21)