Table of Contents
Fetching ...

Distill2Explain: Differentiable decision trees for explainable reinforcement learning in energy application controllers

Gargya Gokhale, Seyed Soroush Karimi Madahi, Bert Claessens, Chris Develder

TL;DR

This paper tackles the challenge of explainable reinforcement learning for residential energy management by distilling a standard RL policy into differentiable decision trees (DDTs). The two-stage method first trains a DQN teacher and then uses policy distillation to obtain shallow, executable DDTs that produce clear if-then-else rules. Empirical results on a battery-based home energy management scenario show that DDTs achieve comparable performance to the teacher with substantially smaller compute footprints, and outperform a baseline RBC by about 20–25%. The work demonstrates the practical potential of edge-deployable, explainable RL controllers for energy systems, while identifying stability and scalability considerations for future extensions.

Abstract

Demand-side flexibility is gaining importance as a crucial element in the energy transition process. Accounting for about 25% of final energy consumption globally, the residential sector is an important (potential) source of energy flexibility. However, unlocking this flexibility requires developing a control framework that (1) easily scales across different houses, (2) is easy to maintain, and (3) is simple to understand for end-users. A potential control framework for such a task is data-driven control, specifically model-free reinforcement learning (RL). Such RL-based controllers learn a good control policy by interacting with their environment, learning purely based on data and with minimal human intervention. Yet, they lack explainability, which hampers user acceptance. Moreover, limited hardware capabilities of residential assets forms a hurdle (e.g., using deep neural networks). To overcome both those challenges, we propose a novel method to obtain explainable RL policies by using differentiable decision trees. Using a policy distillation approach, we train these differentiable decision trees to mimic standard RL-based controllers, leading to a decision tree-based control policy that is data-driven and easy to explain. As a proof-of-concept, we examine the performance and explainability of our proposed approach in a battery-based home energy management system to reduce energy costs. For this use case, we show that our proposed approach can outperform baseline rule-based policies by about 20-25%, while providing simple, explainable control policies. We further compare these explainable policies with standard RL policies and examine the performance trade-offs associated with this increased explainability.

Distill2Explain: Differentiable decision trees for explainable reinforcement learning in energy application controllers

TL;DR

This paper tackles the challenge of explainable reinforcement learning for residential energy management by distilling a standard RL policy into differentiable decision trees (DDTs). The two-stage method first trains a DQN teacher and then uses policy distillation to obtain shallow, executable DDTs that produce clear if-then-else rules. Empirical results on a battery-based home energy management scenario show that DDTs achieve comparable performance to the teacher with substantially smaller compute footprints, and outperform a baseline RBC by about 20–25%. The work demonstrates the practical potential of edge-deployable, explainable RL controllers for energy systems, while identifying stability and scalability considerations for future extensions.

Abstract

Demand-side flexibility is gaining importance as a crucial element in the energy transition process. Accounting for about 25% of final energy consumption globally, the residential sector is an important (potential) source of energy flexibility. However, unlocking this flexibility requires developing a control framework that (1) easily scales across different houses, (2) is easy to maintain, and (3) is simple to understand for end-users. A potential control framework for such a task is data-driven control, specifically model-free reinforcement learning (RL). Such RL-based controllers learn a good control policy by interacting with their environment, learning purely based on data and with minimal human intervention. Yet, they lack explainability, which hampers user acceptance. Moreover, limited hardware capabilities of residential assets forms a hurdle (e.g., using deep neural networks). To overcome both those challenges, we propose a novel method to obtain explainable RL policies by using differentiable decision trees. Using a policy distillation approach, we train these differentiable decision trees to mimic standard RL-based controllers, leading to a decision tree-based control policy that is data-driven and easy to explain. As a proof-of-concept, we examine the performance and explainability of our proposed approach in a battery-based home energy management system to reduce energy costs. For this use case, we show that our proposed approach can outperform baseline rule-based policies by about 20-25%, while providing simple, explainable control policies. We further compare these explainable policies with standard RL policies and examine the performance trade-offs associated with this increased explainability.
Paper Structure (33 sections, 8 equations, 6 figures, 3 tables, 2 algorithms)

This paper contains 33 sections, 8 equations, 6 figures, 3 tables, 2 algorithms.

Figures (6)

  • Figure 1: Illustration of a DDT of depth 2. The rounded boxes depict the decision nodes and the rectangles depict leaf nodes. All $p_{i}$ represent the path probabilities and $p^{L}_{jk}$ denotes the leaf probability distributions.
  • Figure 2: Performance of DDT-based students as a HEMS on different price scenarios. The dots represent the actual performance of individual models and the box plots show the aggregate performance. The student agents are benchmarked using teacher agent "DQN" and a RBC.
  • Figure 3: Visual representation of learned decision trees of depth 2 for both price scenarios. The decision nodes are depicted with unshaded boxes and contain the learned features and the threshold values. The leaf nodes are depicted by grey boxes and contain the learned distribution. The annotations highlight the actions related to each leaf node.
  • Figure 4: Visualizing the trained policy of DDT and DQN-based agent on a simplified HEMS scenario. The heatmaps show the actions chosen by the agents for different values of state-of-charge and price across different demand regions. The bottom row depicts the DQN policy and the top rows show the policy of our proposed DDT-based controllers
  • Figure 5: Example of a learned DDT for depth 3 for the real-world BELPEX price scenario.
  • ...and 1 more figures