Employing Federated Learning for Training Autonomous HVAC Systems
Fredrik Hagström, Vikas Garg, Fabricio Oliveira
TL;DR
The paper tackles the challenge of efficiently training autonomous HVAC controllers that generalize across diverse buildings. It integrates Soft Actor-Critic with federated learning (FedAvg, FedOpt, gradient masking) to train a global policy from data spread across multiple climate zones, while preserving data privacy. Empirical results in a simulated data-center environment show that federated training yields faster learning, better generalization to unseen buildings, and more stable performance than locally trained agents or PID baselines, with Adam as the most effective client optimizer. The work highlights practical benefits for deployment, including privacy preservation and reduced data requirements, while acknowledging remaining gaps to real-world pilots and suggesting directions like learning-rate scheduling and transfer learning.
Abstract
Buildings account for 40% of global energy consumption. A considerable portion of building energy consumption stems from heating, ventilation, and air conditioning (HVAC), and thus implementing smart, energy-efficient HVAC systems has the potential to significantly impact the course of climate change. In recent years, model-free reinforcement learning algorithms have been increasingly assessed for this purpose due to their ability to learn and adapt purely from experience. They have been shown to outperform classical controllers in terms of energy cost and consumption, as well as thermal comfort. However, their weakness lies in their relatively poor data efficiency, requiring long periods of training to reach acceptable policies, making them inapplicable to real-world controllers directly. In this paper, we demonstrate that using federated learning to train the reinforcement learning controller of HVAC systems can improve the learning speed, as well as improve their ability to generalize, which in turn facilitates transfer learning to unseen building environments. In our setting, a global control policy is learned by aggregating local policies trained on multiple data centers located in different climate zones. The goal of the policy is to minimize energy consumption and maximize thermal comfort. We perform experiments evaluating three different optimizers for local policy training, as well as three different federated learning algorithms against two alternative baselines. Our experiments show that these effects lead to a faster learning speed, as well as greater generalization capabilities in the federated policy compared to any individually trained policy. Furthermore, the learning stability is significantly improved, with the learning process and performance of the federated policy being less sensitive to the choice of parameters and the inherent randomness of reinforcement learning.
