Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds
Oliver Lemke, Zuria Bauer, René Zurbrügg, Marc Pollefeys, Francis Engelmann, Hermann Blum
TL;DR
The paper presents Spot-Compose, a modular framework that combines open-vocabulary 3D instance segmentation (OpenMask3D), grasp pose estimation (AnyGrasp), and adaptive navigation to enable dynamic object retrieval and drawer manipulation in human-centric environments. It demonstrates a real-world pipeline on the Spot robot that localizes arbitrary objects via natural language, computes robust grasp-and-position strategies, and estimates drawer motion axes for access to concealed spaces. The main contributions include a Spot-based integration platform, end-to-end capability for open-vocabulary object interaction in 3D scenes, and empirical results showing 51% grasping success and 82% drawer-search success across varied scenes and objects. This work highlights the practical potential of combining 3D perception, manipulation, and motion planning in commodity scanners and mobile robots to operate in everyday human environments.
Abstract
In recent years, modern techniques in deep learning and large-scale datasets have led to impressive progress in 3D instance segmentation, grasp pose estimation, and robotics. This allows for accurate detection directly in 3D scenes, object- and environment-aware grasp prediction, as well as robust and repeatable robotic manipulation. This work aims to integrate these recent methods into a comprehensive framework for robotic interaction and manipulation in human-centric environments. Specifically, we leverage 3D reconstructions from a commodity 3D scanner for open-vocabulary instance segmentation, alongside grasp pose estimation, to demonstrate dynamic picking of objects, and opening of drawers. We show the performance and robustness of our model in two sets of real-world experiments including dynamic object retrieval and drawer opening, reporting a 51% and 82% success rate respectively. Code of our framework as well as videos are available on: https://spot-compose.github.io/.
