Table of Contents
Fetching ...

Clustered Policy Decision Ranking

Mark Levin, Hana Chockler

TL;DR

The paper addresses the interpretability gap in reinforcement learning by introducing a black-box method that clusters policy decisions based on their contribution to reward and ranks these clusters using a covariance-inspired approach. It combines random sampling, a modified TF-IDF vectorization, and PCA-based clustering to identify small, informative state groups; pruned policies built from top-ranked clusters are evaluated without retraining to assess impact on performance. Experiments on MiniGrid and Atari show that cluster-based pruning can retain much of the original reward and, in several cases, outperform traditional SBFL baselines, highlighting the method's potential for explainable RL. The work advocates using a portfolio of pruning techniques and suggests that better state encoders could further improve clustering quality and interpretability.

Abstract

Policies trained via reinforcement learning (RL) are often very complex even for simple tasks. In an episode with n time steps, a policy will make n decisions on actions to take, many of which may appear non-intuitive to the observer. Moreover, it is not clear which of these decisions directly contribute towards achieving the reward and how significant their contribution is. Given a trained policy, we propose a black-box method based on statistical covariance estimation that clusters the states of the environment and ranks each cluster according to the importance of decisions made in its states. We compare our measure against a previous statistical fault localization based ranking procedure.

Clustered Policy Decision Ranking

TL;DR

The paper addresses the interpretability gap in reinforcement learning by introducing a black-box method that clusters policy decisions based on their contribution to reward and ranks these clusters using a covariance-inspired approach. It combines random sampling, a modified TF-IDF vectorization, and PCA-based clustering to identify small, informative state groups; pruned policies built from top-ranked clusters are evaluated without retraining to assess impact on performance. Experiments on MiniGrid and Atari show that cluster-based pruning can retain much of the original reward and, in several cases, outperform traditional SBFL baselines, highlighting the method's potential for explainable RL. The work advocates using a portfolio of pruning techniques and suggests that better state encoders could further improve clustering quality and interpretability.

Abstract

Policies trained via reinforcement learning (RL) are often very complex even for simple tasks. In an episode with n time steps, a policy will make n decisions on actions to take, many of which may appear non-intuitive to the observer. Moreover, it is not clear which of these decisions directly contribute towards achieving the reward and how significant their contribution is. Given a trained policy, we propose a black-box method based on statistical covariance estimation that clusters the states of the environment and ranks each cluster according to the importance of decisions made in its states. We compare our measure against a previous statistical fault localization based ranking procedure.
Paper Structure (16 sections, 6 equations, 5 figures, 1 algorithm)

This paper contains 16 sections, 6 equations, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: A flowchart of the method, using Minigrid gym_minigrid as an example.
  • Figure 2: Bowling Reward by States Restored
  • Figure 3: Bowling Reward by Actions Restored
  • Figure 4: Krull Reward by States Restored
  • Figure 5: Krull Reward by Actions Restored