Three Pathways to Neurosymbolic Reinforcement Learning with Interpretable Model and Policy Networks

Peter Graf; Patrick Emami

Three Pathways to Neurosymbolic Reinforcement Learning with Interpretable Model and Policy Networks

Peter Graf, Patrick Emami

TL;DR

This work addresses how to build reinforcement learning agents that are both differentiable and interpretable by integrating neurosymbolic architectures into NSRL. It presents three pathways—model-free RL with Differentiable Decision Trees, model-based RL using Logical Neural Networks to form planning problems, and differentiable predictive control with LNNs in a differentiable simulator—demonstrated in building energy management with the OCHRE/Gym environment. The study highlights key tradeoffs: while differentiability aids learning, classical logic is discrete and can hinder optimization; interpretability tends to favor symbolic rules, yet scalability and discretization pose challenges. The results show that rule-based controllers remain strong baselines in some TOU scenarios, but learned NSAI policies can adapt and offer interpretability benefits, especially when warm-starts and symbolic planning are employed, pointing to a promising fusion of symbolic and continuous approaches for real-world control. The work lays groundwork for scalable NSRL in complex, uncertain systems and suggests directions for future enhancements, including large language model integration to guide rule discovery and refinement.

Abstract

Neurosymbolic AI combines the interpretability, parsimony, and explicit reasoning of classical symbolic approaches with the statistical learning of data-driven neural approaches. Models and policies that are simultaneously differentiable and interpretable may be key enablers of this marriage. This paper demonstrates three pathways to implementing such models and policies in a real-world reinforcement learning setting. Specifically, we study a broad class of neural networks that build interpretable semantics directly into their architecture. We reveal and highlight both the potential and the essential difficulties of combining logic, simulation, and learning. One lesson is that learning benefits from continuity and differentiability, but classical logic is discrete and non-differentiable. The relaxation to real-valued, differentiable representations presents a trade-off; the more learnable, the less interpretable. Another lesson is that using logic in the context of a numerical simulation involves a non-trivial mapping from raw (e.g., real-valued time series) simulation data to logical predicates. Some open questions this note exposes include: What are the limits of rule-based controllers, and how learnable are they? Do the differentiable interpretable approaches discussed here scale to large, complex, uncertain systems? Can we truly achieve interpretability? We highlight these and other themes across the three approaches.

Three Pathways to Neurosymbolic Reinforcement Learning with Interpretable Model and Policy Networks

TL;DR

Abstract

Paper Structure (22 sections, 2 equations, 7 figures, 3 tables)

This paper contains 22 sections, 2 equations, 7 figures, 3 tables.

Introduction
Explicitly Interpretable Neural Network Architectures
Illustrative test cases in BEM
Related Work
Background
Differentiable Decision Trees
Logical Neural Networks
RL Paradigms
Experiment
OCHRE Gym
Setup
Results
Test Case 1: Model-based RL with LNNs and classical planning
Setup
Demonstration
...and 7 more sections

Figures (7)

Figure 1: Details of a 4-leaf DDT. (left) the mathematical formulation's indexing scheme (right) Our baseline RBC controller (see section \ref{['sec:ddtsetup']}) as a DDT from which we can initialize learning.
Figure 2: Pseudocode of three methods to achieve symbolic RL explored in this paper. (left) Model-free RL; in our case the policy is a DDT (middle) Model-based RL with LNN model used to build and solve classical planning problem. (right) Direct optimization integrating a differentiable simulation and LNN policy.
Figure 3: Comparing rule-based, deep RL, and DDT based controllers for a 30 day TOU task. The DRL and DDT controllers were trained (and the RBC controller tuned) on data for June. The figure then shows the 30 day cost when these controllers were used, unmodified, across 9 other months. The "warmstart" DDTs are (approximately) initialized with the RBC "precool" policy, whereas the "coldstart" DDTs are randomly initialized.
Figure 4: The definitions of a vocabulary in terms of raw simulation data. For example "cold" is defined as a the simulation parameter "Temperature - Indoor (C)" being less than the python variable mydesiredtemp. Similarly, the logical action "turn_heat_on" is defined as setting the OCHRE control variable "Load Fraction" to a value of 1.
Figure 5: Workflow of the OCHRE to LNN to PDDL model-based RL framework. See text for details.
...and 2 more figures

Three Pathways to Neurosymbolic Reinforcement Learning with Interpretable Model and Policy Networks

TL;DR

Abstract

Three Pathways to Neurosymbolic Reinforcement Learning with Interpretable Model and Policy Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (7)