Table of Contents
Fetching ...

Neural DNF-MT: A Neuro-symbolic Approach for Learning Interpretable and Editable Policies

Kexin Gu Baugh, Luke Dickens, Alessandra Russo

TL;DR

This work tackles the interpretability gap in deep RL by introducing neural DNF-MT, a differentiable neuro-symbolic actor that learns policies end-to-end while enabling direct translation to interpretable logic programs. By incorporating mutex-tanh activation, the model supports both probabilistic (ProbLog) and deterministic (ASP) policy representations, and it offers predicate invention through encoder layers. The approach enables bidirectional translation between neural policies and logical rules, allowing manual policy intervention without re-training. Empirical results across diverse tasks show competitive performance relative to black-box baselines while providing actionable, human-readable policy explanations. The work also discusses limitations of thresholding in post-training and outlines directions for improving robust rule extraction and intervention workflows.

Abstract

Although deep reinforcement learning has been shown to be effective, the model's black-box nature presents barriers to direct policy interpretation. To address this problem, we propose a neuro-symbolic approach called neural DNF-MT for end-to-end policy learning. The differentiable nature of the neural DNF-MT model enables the use of deep actor-critic algorithms for training. At the same time, its architecture is designed so that trained models can be directly translated into interpretable policies expressed as standard (bivalent or probabilistic) logic programs. Moreover, additional layers can be included to extract abstract features from complex observations, acting as a form of predicate invention. The logic representations are highly interpretable, and we show how the bivalent representations of deterministic policies can be edited and incorporated back into a neural model, facilitating manual intervention and adaptation of learned policies. We evaluate our approach on a range of tasks requiring learning deterministic or stochastic behaviours from various forms of observations. Our empirical results show that our neural DNF-MT model performs at the level of competing black-box methods whilst providing interpretable policies.

Neural DNF-MT: A Neuro-symbolic Approach for Learning Interpretable and Editable Policies

TL;DR

This work tackles the interpretability gap in deep RL by introducing neural DNF-MT, a differentiable neuro-symbolic actor that learns policies end-to-end while enabling direct translation to interpretable logic programs. By incorporating mutex-tanh activation, the model supports both probabilistic (ProbLog) and deterministic (ASP) policy representations, and it offers predicate invention through encoder layers. The approach enables bidirectional translation between neural policies and logical rules, allowing manual policy intervention without re-training. Empirical results across diverse tasks show competitive performance relative to black-box baselines while providing actionable, human-readable policy explanations. The work also discusses limitations of thresholding in post-training and outlines directions for improving robust rule extraction and intervention workflows.

Abstract

Although deep reinforcement learning has been shown to be effective, the model's black-box nature presents barriers to direct policy interpretation. To address this problem, we propose a neuro-symbolic approach called neural DNF-MT for end-to-end policy learning. The differentiable nature of the neural DNF-MT model enables the use of deep actor-critic algorithms for training. At the same time, its architecture is designed so that trained models can be directly translated into interpretable policies expressed as standard (bivalent or probabilistic) logic programs. Moreover, additional layers can be included to extract abstract features from complex observations, acting as a form of predicate invention. The logic representations are highly interpretable, and we show how the bivalent representations of deterministic policies can be edited and incorporated back into a neural model, facilitating manual intervention and adaptation of learned policies. We evaluate our approach on a range of tasks requiring learning deterministic or stochastic behaviours from various forms of observations. Our empirical results show that our neural DNF-MT model performs at the level of competing black-box methods whilst providing interpretable policies.
Paper Structure (35 sections, 2 theorems, 30 equations, 12 figures, 7 tables)

This paper contains 35 sections, 2 theorems, 30 equations, 12 figures, 7 tables.

Key Result

proposition 1

Given a conjunctive semi-symbolic node that satisfies Conditions (condition:neural-to-asp-weight) and its translated ASP rule with rule head $h$ based on Translation (eq:translation-neural-to-asp-rule), and an input tensor $\mathbf{x}$ that satisfies Condition (condition:neural-to-asp-input) and its

Figures (12)

  • Figure 1: Neural DNF-MT model as an actor in actor-critic PPO, in environments with discrete observations.
  • Figure 2: Neural DNF-MT model as an actor in actor-critic PPO, in environments with complex observations, such as an image-like multi-dimensional matrix.
  • Figure 3: Post-training processing to extract an interpretable logical policy from a trained neural DNF-MT actor. There are two branches: one with sub-label (a) for extracting a stochastic policy in ProbLog and the other with sub-label (b) for extracting a deterministic policy in ASP.
  • Figure 4: Mean episodic return (y-axis) $\pm$ standard error of the baselines and neural DNF-MT models, together with the ProbLog/ASP programs extracted from their corresponding neural DNF-MT models. All Q-tables are trained using Q-learning, and all MLP actors are trained with actor-critic PPO. Most neural DNF-MT actors are trained with actor-critic PPO. Except in the Taxi environment, the neural DNF-MT actor is distilled from a trained MLP actor (shown in dashed border and faded colour). Different symbols after the actor's name indicate different action selection methods: * for argmax action selection, $\dagger$ for $\epsilon$-greedy sampling, and $\ddagger$ for actor's distribution sampling. The same result is also reported in Table \ref{['tab:rl-performance']} in Appendix.
  • Figure 5: Small corridor (SC), same as the one from rl-book-sutton-barto.
  • ...and 7 more figures

Theorems & Definitions (7)

  • Definition 3.1: Generalised Belief Logic
  • Definition 3.2: Logical mutual exclusivity
  • Definition 3.3: Probabilistic mutual exclusivity
  • Remark 1
  • Remark 2
  • proposition 1
  • proposition 2