VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
Yichao Liang, Nishanth Kumar, Hao Tang, Adrian Weller, Joshua B. Tenenbaum, Tom Silver, João F. Henriques, Kevin Ellis
TL;DR
The paper tackles the challenge of planning in robotics by learning Neuro-Symbolic Predicates (NSPs) that provide task-specific, perceptually grounded abstractions. It proposes an online method that alternates predicate invention via VLMs, predicate scoring, and goal-directed exploration within a planner-driven loop to build an abstract world model characterized by predicates and high-level actions (HLAs). This approach is validated across five simulated robotic domains, showing superior sample efficiency, stronger generalization to unseen tasks, and improved interpretability compared with hierarchical reinforcement learning, VLM-based planning, and traditional symbolic predicates. The results demonstrate that autonomously learned NSPs can approach oracle-level performance with manually designed abstractions, enabling robust, long-horizon robot planning in visually rich and variable environments.
Abstract
Broadly intelligent agents should form task-specific abstractions that selectively expose the essential elements of a task, while abstracting away the complexity of the raw sensorimotor space. In this work, we present Neuro-Symbolic Predicates, a first-order abstraction language that combines the strengths of symbolic and neural knowledge representations. We outline an online algorithm for inventing such predicates and learning abstract world models. We compare our approach to hierarchical reinforcement learning, vision-language model planning, and symbolic predicate invention approaches, on both in- and out-of-distribution tasks across five simulated robotic domains. Results show that our approach offers better sample complexity, stronger out-of-distribution generalization, and improved interpretability.
