Guided Exploration for Efficient Relational Model Learning
Annie Feng, Nishanth Kumar, Tomas Lozano-Perez, Leslie Pack-Kaelbling
TL;DR
This paper tackles the challenge of efficiently learning relational models in large-scale, long-horizon domains by extending goal-literal babbling (GLIB) with oracle-driven operator initialization and goal-guidance. It formalizes relational learning via environments $\langle \mathcal{S}, \mathcal{A}, H, T, \mathcal{P} \rangle$ and tasks $\langle I, \mathcal{O}, G \rangle$, and introduces Baking-Large to stress-test scalability. The core contribution is a principled data-collection framework that uses oracle demonstrations to initialize a sufficient set of lifted operator effects and precondition-targeting guidance to maximize informative transitions, significantly improving sample efficiency and generalization over GLIB and a BFS-based oracle. The findings highlight the potential of combining demonstrations with targeted exploration to enable robust relational model learning in complex domains, with practical impact for planning-based agents operating in realistic environments. Future work points to leveraging large language models and human-robot teaching to approximate the oracle components in real-world settings, enabling scalable autonomous learning of relational planners.
Abstract
Efficient exploration is critical for learning relational models in large-scale environments with complex, long-horizon tasks. Random exploration methods often collect redundant or irrelevant data, limiting their ability to learn accurate relational models of the environment. Goal-literal babbling (GLIB) improves upon random exploration by setting and planning to novel goals, but its reliance on random actions and random novel goal selection limits its scalability to larger domains. In this work, we identify the principles underlying efficient exploration in relational domains: (1) operator initialization with demonstrations that cover the distinct lifted effects necessary for planning and (2) refining preconditions to collect maximally informative transitions by selecting informative goal-action pairs and executing plans to them. To demonstrate these principles, we introduce Baking-Large, a challenging domain with extensive state-action spaces and long-horizon tasks. We evaluate methods using oracle-driven demonstrations for operator initialization and precondition-targeting guidance to efficiently gather critical transitions. Experiments show that both the oracle demonstrations and precondition-targeting oracle guidance significantly improve sample efficiency and generalization, paving the way for future methods to use these principles to efficiently learn accurate relational models in complex domains.
