Online inductive learning from answer sets for efficient reinforcement learning exploration
Celeste Veronese, Daniele Meli, Alessandro Farinelli
TL;DR
The paper tackles reinforcement learning inefficiency and opaque decision-making by introducing online neurosymbolic learning that uses inductive logic programming to extract human-interpretable policy heuristics from batches of experience and ASP reasoning to bias subsequent exploration. The method learns a logical approximation of the agent's policy online via ILP (FastLAS) and applies ASP-based reasoning to steer exploration, while preserving RL convergence through a probabilistic soft bias instead of reward shaping. Empirical validation on Pac-Man across two maps shows substantial improvements in discounted return with modest computational overhead and rapid convergence of the learned heuristics. The work demonstrates a scalable, explainable neurosymbolic integration for online RL that can extend to other domains and learning algorithms.
Abstract
This paper presents a novel approach combining inductive logic programming with reinforcement learning to improve training performance and explainability. We exploit inductive learning of answer set programs from noisy examples to learn a set of logical rules representing an explainable approximation of the agent policy at each batch of experience. We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch, without requiring inefficient reward shaping and preserving optimality with soft bias. The entire procedure is conducted during the online execution of the reinforcement learning algorithm. We preliminarily validate the efficacy of our approach by integrating it into the Q-learning algorithm for the Pac-Man scenario in two maps of increasing complexity. Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training. Moreover, inductive learning does not compromise the computational time required by Q-learning and learned rules quickly converge to an explanation of the agent policy.
