Table of Contents
Fetching ...

Explainable RL Policies by Distilling to Locally-Specialized Linear Policies with Voronoi State Partitioning

Senne Deproost, Dennis Steckelmacher, Ann Nowé

TL;DR

The paper addresses the opacity of deep reinforcement learning controllers by introducing Voronoi state partitioning to distill locally specialized linear subpolicies from a trained DRL agent. A kd-tree based region mapping enables efficient inference, while periodic splitting and merging of regions refine the policy boundaries to balance simplicity with expressiveness. Empirical validation on a gridworld-like navigation task and MountainCarContinuous shows that the distilled locally-linear policies can closely track or even exceed the performance of the original DRL policy, while providing interpretable regional behavior. The approach offers a practical path toward explainable RL in safety- and regulation-sensitive domains, though it acknowledges limitations in high-dimensional state spaces and the interpretability of Voronoi boundaries. Future work includes axis-aligned region definitions and broader controller-style substitutions beyond linear models.

Abstract

Deep Reinforcement Learning is one of the state-of-the-art methods for producing near-optimal system controllers. However, deep RL algorithms train a deep neural network, that lacks transparency, which poses challenges when the controller has to meet regulations, or foster trust. To alleviate this, one could transfer the learned behaviour into a model that is human-readable by design using knowledge distilla- tion. Often this is done with a single model which mimics the original model on average but could struggle in more dynamic situations. A key challenge is that this simpler model should have the right balance be- tween flexibility and complexity or right balance between balance bias and accuracy. We propose a new model-agnostic method to divide the state space into regions where a simplified, human-understandable model can operate in. In this paper, we use Voronoi partitioning to find regions where linear models can achieve similar performance to the original con- troller. We evaluate our approach on a gridworld environment and a classic control task. We observe that our proposed distillation to locally- specialized linear models produces policies that are explainable and show that the distillation matches or even slightly outperforms the black-box policy they are distilled from.

Explainable RL Policies by Distilling to Locally-Specialized Linear Policies with Voronoi State Partitioning

TL;DR

The paper addresses the opacity of deep reinforcement learning controllers by introducing Voronoi state partitioning to distill locally specialized linear subpolicies from a trained DRL agent. A kd-tree based region mapping enables efficient inference, while periodic splitting and merging of regions refine the policy boundaries to balance simplicity with expressiveness. Empirical validation on a gridworld-like navigation task and MountainCarContinuous shows that the distilled locally-linear policies can closely track or even exceed the performance of the original DRL policy, while providing interpretable regional behavior. The approach offers a practical path toward explainable RL in safety- and regulation-sensitive domains, though it acknowledges limitations in high-dimensional state spaces and the interpretability of Voronoi boundaries. Future work includes axis-aligned region definitions and broader controller-style substitutions beyond linear models.

Abstract

Deep Reinforcement Learning is one of the state-of-the-art methods for producing near-optimal system controllers. However, deep RL algorithms train a deep neural network, that lacks transparency, which poses challenges when the controller has to meet regulations, or foster trust. To alleviate this, one could transfer the learned behaviour into a model that is human-readable by design using knowledge distilla- tion. Often this is done with a single model which mimics the original model on average but could struggle in more dynamic situations. A key challenge is that this simpler model should have the right balance be- tween flexibility and complexity or right balance between balance bias and accuracy. We propose a new model-agnostic method to divide the state space into regions where a simplified, human-understandable model can operate in. In this paper, we use Voronoi partitioning to find regions where linear models can achieve similar performance to the original con- troller. We evaluate our approach on a gridworld environment and a classic control task. We observe that our proposed distillation to locally- specialized linear models produces policies that are explainable and show that the distillation matches or even slightly outperforms the black-box policy they are distilled from.

Paper Structure

This paper contains 30 sections, 2 equations, 6 figures, 6 tables, 3 algorithms.

Figures (6)

  • Figure 1: An example of a Voronoi diagram using 8 codeword points.
  • Figure 2: The two environments used to validate our method.
  • Figure 3: Spread of performance in terms of achieved return (higher is better). The DRL agent is evaluated on 1000 evaluations while 85 instances of our method were evaluated on the same 1000 episodes for a total of 85000 runs. We observe similar spread and performance in both settings.
  • Figure 4: On SimpleGoal: original black-box policy learned with TD3, and the result of its distillation to explainable locally-linear policies.
  • Figure 5: Spread of performance in terms of achieved return (higher is better). The DRL agent is evaluated on 1000 evaluations while 40 instances of our method were evaluated on the same 1000 episodes for a total of 40000 runs. We observe a larger spread in our method and a higher median performance compared to the DRL agent.
  • ...and 1 more figures