Automating the Refinement of Reinforcement Learning Specifications
Tanmay Ambadkar, Đorđe Žikelić, Abhinav Verma
TL;DR
AutoSpec tackles under-specified reinforcement learning tasks by automatically refining coarse SpectRL specifications into sound, more informative refinements that guide learning. It introduces four refinement operators operating on abstract SpectRL graphs and demonstrates how a wrapper framework can iteratively improve policy satisfaction thresholds when integrated with DiRL and LSTS. The empirical results across 9-rooms, randomized grids, and PandaGym show notable gains in learnability and robustness, including in high-dimensional settings. This approach advances practical specification-guided RL by enabling automated, principled refinement of task specifications while acknowledging inherent limitations in completeness and exploration requirements.
Abstract
Logical specifications have been shown to help reinforcement learning algorithms in achieving complex tasks. However, when a task is under-specified, agents might fail to learn useful policies. In this work, we explore the possibility of improving coarse-grained logical specifications via an exploration-guided strategy. We propose \textsc{AutoSpec}, a framework that searches for a logical specification refinement whose satisfaction implies satisfaction of the original specification, but which provides additional guidance therefore making it easier for reinforcement learning algorithms to learn useful policies. \textsc{AutoSpec} is applicable to reinforcement learning tasks specified via the SpectRL specification logic. We exploit the compositional nature of specifications written in SpectRL, and design four refinement procedures that modify the abstract graph of the specification by either refining its existing edge specifications or by introducing new edge specifications. We prove that all four procedures maintain specification soundness, i.e. any trajectory satisfying the refined specification also satisfies the original. We then show how \textsc{AutoSpec} can be integrated with existing reinforcement learning algorithms for learning policies from logical specifications. Our experiments demonstrate that \textsc{AutoSpec} yields promising improvements in terms of the complexity of control tasks that can be solved, when refined logical specifications produced by \textsc{AutoSpec} are utilized.
