RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation
Hanxiao Jiang, Binghao Huang, Ruihai Wu, Zhuoran Li, Shubham Garg, Hooshang Nayyeri, Shenlong Wang, Yunzhu Li
TL;DR
RoboEXP tackles the challenge of interactive scene exploration by constructing an action-conditioned 3D scene graph (ACSG) that encodes both spatial structure and action-dependent relationships. The system integrates perception, memory, decision-making, and action modules powered by a Large Multimodal Model to autonomously explore and incrementally build the ACSG, enabling robust manipulation across rigid, articulated, nested, and deformable objects. Experiments in tabletop and room settings show RoboEXP outperforms GPT-4V baselines in constructing complete ACSGs and guiding downstream tasks, with strong resilience to occlusion and intervention. The ACSG provides a principled, scalable representation for planning and executing complex manipulation in unknown environments, paving the way for practical household and office robotics.
Abstract
We introduce the novel task of interactive scene exploration, wherein robots autonomously explore environments and produce an action-conditioned scene graph (ACSG) that captures the structure of the underlying environment. The ACSG accounts for both low-level information (geometry and semantics) and high-level information (action-conditioned relationships between different entities) in the scene. To this end, we present the Robotic Exploration (RoboEXP) system, which incorporates the Large Multimodal Model (LMM) and an explicit memory design to enhance our system's capabilities. The robot reasons about what and how to explore an object, accumulating new information through the interaction process and incrementally constructing the ACSG. Leveraging the constructed ACSG, we illustrate the effectiveness and efficiency of our RoboEXP system in facilitating a wide range of real-world manipulation tasks involving rigid, articulated objects, nested objects, and deformable objects.
