Table of Contents
Fetching ...

Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning

Hector Kohler, Quentin Delfosse, Riad Akrour, Kristian Kersting, Philippe Preux

TL;DR

INTERPRETER addresses trust and misalignment in deep reinforcement learning by distilling neural policies into compact, editable Python tree programs built from oblique decision trees. The method imitates neural oracles via $Q$-Dagger or Dagger, converts the best oblique tree into readable code, and enables human interventions through straightforward edits across diverse tasks, including Atari, MuJoCo, and a real-world soil-fertilization scenario. Empirical results show competitive or superior performance with small trees (as few as 16–64 leaves) and fast inference, while user studies and editing demonstrations highlight interpretability and practical editability. These findings suggest a practical path toward trustworthy RL systems with transparent, modifiable policies that can be aligned with human values and domain knowledge.

Abstract

Deep reinforcement learning agents are prone to goal misalignments. The black-box nature of their policies hinders the detection and correction of such misalignments, and the trust necessary for real-world deployment. So far, solutions learning interpretable policies are inefficient or require many human priors. We propose INTERPRETER, a fast distillation method producing INTerpretable Editable tRee Programs for ReinforcEmenT lEaRning. We empirically demonstrate that INTERPRETER compact tree programs match oracles across a diverse set of sequential decision tasks and evaluate the impact of our design choices on interpretability and performances. We show that our policies can be interpreted and edited to correct misalignments on Atari games and to explain real farming strategies.

Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning

TL;DR

INTERPRETER addresses trust and misalignment in deep reinforcement learning by distilling neural policies into compact, editable Python tree programs built from oblique decision trees. The method imitates neural oracles via -Dagger or Dagger, converts the best oblique tree into readable code, and enables human interventions through straightforward edits across diverse tasks, including Atari, MuJoCo, and a real-world soil-fertilization scenario. Empirical results show competitive or superior performance with small trees (as few as 16–64 leaves) and fast inference, while user studies and editing demonstrations highlight interpretability and practical editability. These findings suggest a practical path toward trustworthy RL systems with transparent, modifiable policies that can be aligned with human values and domain knowledge.

Abstract

Deep reinforcement learning agents are prone to goal misalignments. The black-box nature of their policies hinders the detection and correction of such misalignments, and the trust necessary for real-world deployment. So far, solutions learning interpretable policies are inefficient or require many human priors. We propose INTERPRETER, a fast distillation method producing INTerpretable Editable tRee Programs for ReinforcEmenT lEaRning. We empirically demonstrate that INTERPRETER compact tree programs match oracles across a diverse set of sequential decision tasks and evaluate the impact of our design choices on interpretability and performances. We show that our policies can be interpreted and edited to correct misalignments on Atari games and to explain real farming strategies.
Paper Structure (20 sections, 12 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 12 figures, 3 tables, 1 algorithm.

Figures (12)

  • Figure 1: INTERPRETER provides editable interpretable policy, as a Python tree programs, illustrated on the Swimmer (left) and Pong (right) environments.
  • Figure 2: INTERPRETER's Distillation process. The MDP state-action space is simplified (idle features and equivalent actions are masked), then an oblique tree policy imitates the oracle. Finally, the policy is then translated to readable and executable code: experts can verify and edit.
  • Figure 3: Oracle decision rules are oblique illustrated on PPO for different state space partitions of the Pong environment. Decisions boundaries are both oblique and parallel.
  • Figure 4: INTERPRETER matches oracles thanks to its design choices. From left to right: ablated INTERPRETER, INTERPRETER with different oracles and imitations, performances and runtimes.
  • Figure 5: $Q$-Dagger does not improve sampling, shown by the similar loss (to Dagger) during the extraction for different oracles and imitations.
  • ...and 7 more figures