EXPIL: Explanatory Predicate Invention for Learning in Games
Jingyuan Sha, Hikaru Shindo, Quentin Delfosse, Kristian Kersting, Devendra Singh Dhami
TL;DR
EXPIL addresses the interpretability bottleneck in reinforcement learning by automatically discovering explanatory predicates from a pretrained agent's replay buffer and constructing first-order-logic policies. It introduces Necessity and Sufficiency predicates to evaluate and refine these concepts, using beam search to form weighted policy clauses and actor-critic updates to train strategy. Across Getout, Loot, and Threefish, EXPIL matches or surpasses neural PPO and the state-of-the-art NeSy baseline while requiring minimal hand-crafted priors, enabling explainable behavior in relational environments. This work advances interpretable, robust RL by reducing the dependency on predefined background knowledge and providing a scalable framework for explanatory predicate invention in learning-from-games settings.
Abstract
Reinforcement learning (RL) has proven to be a powerful tool for training agents that excel in various games. However, the black-box nature of neural network models often hinders our ability to understand the reasoning behind the agent's actions. Recent research has attempted to address this issue by using the guidance of pretrained neural agents to encode logic-based policies, allowing for interpretable decisions. A drawback of such approaches is the requirement of large amounts of predefined background knowledge in the form of predicates, limiting its applicability and scalability. In this work, we propose a novel approach, Explanatory Predicate Invention for Learning in Games (EXPIL), that identifies and extracts predicates from a pretrained neural agent, later used in the logic-based agents, reducing the dependency on predefined background knowledge. Our experimental evaluation on various games demonstrate the effectiveness of EXPIL in achieving explainable behavior in logic agents while requiring less background knowledge.
