Table of Contents
Fetching ...

Go Beyond Black-box Policies: Rethinking the Design of Learning Agent for Interpretable and Verifiable HVAC Control

Zhiyu An, Xianzhong Ding, Wan Du

TL;DR

This work designs HVAC controllers using decision trees extracted from existing thermal dynamics models and historical data, and introduces a novel verification criterion for RL agents in HVAC control based on domain knowledge and develops a policy extraction procedure that produces a verifiable decision tree policy.

Abstract

Recent research has shown the potential of Model-based Reinforcement Learning (MBRL) to enhance energy efficiency of Heating, Ventilation, and Air Conditioning (HVAC) systems. However, existing methods rely on black-box thermal dynamics models and stochastic optimizers, lacking reliability guarantees and posing risks to occupant health. In this work, we overcome the reliability bottleneck by redesigning HVAC controllers using decision trees extracted from existing thermal dynamics models and historical data. Our decision tree-based policies are deterministic, verifiable, interpretable, and more energy-efficient than current MBRL methods. First, we introduce a novel verification criterion for RL agents in HVAC control based on domain knowledge. Second, we develop a policy extraction procedure that produces a verifiable decision tree policy. We found that the high dimensionality of the thermal dynamics model input hinders the efficiency of policy extraction. To tackle the dimensionality challenge, we leverage importance sampling conditioned on historical data distributions, significantly improving policy extraction efficiency. Lastly, we present an offline verification algorithm that guarantees the reliability of a control policy. Extensive experiments show that our method saves 68.4% more energy and increases human comfort gain by 14.8% compared to the state-of-the-art method, in addition to an 1127x reduction in computation overhead. Our code and data are available at https://github.com/ryeii/Veri_HVAC

Go Beyond Black-box Policies: Rethinking the Design of Learning Agent for Interpretable and Verifiable HVAC Control

TL;DR

This work designs HVAC controllers using decision trees extracted from existing thermal dynamics models and historical data, and introduces a novel verification criterion for RL agents in HVAC control based on domain knowledge and develops a policy extraction procedure that produces a verifiable decision tree policy.

Abstract

Recent research has shown the potential of Model-based Reinforcement Learning (MBRL) to enhance energy efficiency of Heating, Ventilation, and Air Conditioning (HVAC) systems. However, existing methods rely on black-box thermal dynamics models and stochastic optimizers, lacking reliability guarantees and posing risks to occupant health. In this work, we overcome the reliability bottleneck by redesigning HVAC controllers using decision trees extracted from existing thermal dynamics models and historical data. Our decision tree-based policies are deterministic, verifiable, interpretable, and more energy-efficient than current MBRL methods. First, we introduce a novel verification criterion for RL agents in HVAC control based on domain knowledge. Second, we develop a policy extraction procedure that produces a verifiable decision tree policy. We found that the high dimensionality of the thermal dynamics model input hinders the efficiency of policy extraction. To tackle the dimensionality challenge, we leverage importance sampling conditioned on historical data distributions, significantly improving policy extraction efficiency. Lastly, we present an offline verification algorithm that guarantees the reliability of a control policy. Extensive experiments show that our method saves 68.4% more energy and increases human comfort gain by 14.8% compared to the state-of-the-art method, in addition to an 1127x reduction in computation overhead. Our code and data are available at https://github.com/ryeii/Veri_HVAC
Paper Structure (19 sections, 5 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 5 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Left: the distribution of setpoints over 10 runs on a fixed set of disturbances of one day. Right: the distribution of setpoints in the left figure.
  • Figure 2: Left: our proposed procedure. Right: an illustration of a DT with two variables (time and temp). The leaf nodes are classified into three categories based on temperature. The decision path verification algorithm detects and corrects failed nodes.
  • Figure 3: Preliminary experiment to determine the appropriate noise level.
  • Figure 4: Building control results.
  • Figure 5: Our method's behavior example.
  • ...and 2 more figures

Theorems & Definitions (1)

  • proof