Reinforcement Learning with Knowledge Representation and Reasoning: A Brief Survey
Chao Yu, Shicheng Ye, Hankz Hankui Zhuo
TL;DR
Reinforcement learning (RL) often suffers from limited sample efficiency, poor generalization, and safety/interpretability gaps. The paper surveys how Knowledge Representation and Reasoning (KRR) methods—such as Reward Machines for non-Markovian rewards, temporal-logic automata, Answer Set Programming, Markov Logic Networks, and planning formalisms—can be integrated with RL to improve efficiency, generalization, and safety. It categorizes the literature into three strands (efficiency, generalization, safety/interpretability), detailing representative approaches like QRM/HRM, PEORL/SDRL, TLs with MITL, RMs for transfer, and safety monitors, while highlighting open problems such as extending beyond LTL and integrating LLMs. Overall, the survey points to a future where AI agents combine high-level reasoning with exploratory RL to achieve scalable, verifiable, and adaptable behavior across complex tasks.
Abstract
Reinforcement Learning (RL) has achieved tremendous development in recent years, but still faces significant obstacles in addressing complex real-life problems due to the issues of poor system generalization, low sample efficiency as well as safety and interpretability concerns. The core reason underlying such dilemmas can be attributed to the fact that most of the work has focused on the computational aspect of value functions or policies using a representational model to describe atomic components of rewards, states and actions etc, thus neglecting the rich high-level declarative domain knowledge of facts, relations and rules that can be either provided a priori or acquired through reasoning over time. Recently, there has been a rapidly growing interest in the use of Knowledge Representation and Reasoning (KRR) methods, usually using logical languages, to enable more abstract representation and efficient learning in RL. In this survey, we provide a preliminary overview on these endeavors that leverage the strengths of KRR to help solving various problems in RL, and discuss the challenging open problems and possible directions for future work in this area.
