Opening Articulated Structures in the Real World
Arjun Gupta, Michelle Zhang, Rishik Sathua, Saurabh Gupta
TL;DR
The paper tackles the problem of generalizing mobile manipulation to unseen objects in unseen environments by introducing MOSART, a modular system that combines on-board perception (APM), whole-body motion planning (SeqIK), and proprioceptive adaptation for zero-shot opening of articulated structures. Through a large-scale real-world evaluation across 13 sites and 31 objects, the authors show that a modular approach outperforms end-to-end imitation learning even when the latter is trained on thousands of demonstrations, and identify perception as the main bottleneck. Key contributions include the Articulation-parameter Prediction Module (APM), a two-stage RGB-D approach with 3D lifting, and a contact-based adaptation strategy that improves last-centimeter grasping. The work provides a pragmatic roadmap for system-level research in generalizable mobile manipulation and highlights concrete directions to enhance perception and grasping robustness in real-world deployments.
Abstract
What does it take to build mobile manipulation systems that can competently operate on previously unseen objects in previously unseen environments? This work answers this question using opening of articulated structures as a mobile manipulation testbed. Specifically, our focus is on the end-to-end performance on this task without any privileged information, i.e. the robot starts at a location with the novel target articulated object in view, and has to approach the object and successfully open it. We first develop a system for this task, and then conduct 100+ end-to-end system tests across 13 real world test sites. Our large-scale study reveals a number of surprising findings: a) modular systems outperform end-to-end learned systems for this task, even when the end-to-end learned systems are trained on 1000+ demonstrations, b) perception, and not precise end-effector control, is the primary bottleneck to task success, and c) state-of-the-art articulation parameter estimation models developed in isolation struggle when faced with robot-centric viewpoints. Overall, our findings highlight the limitations of developing components of the pipeline in isolation and underscore the need for system-level research, providing a pragmatic roadmap for building generalizable mobile manipulation systems. Videos, code, and models are available on the project website: https://arjung128.github.io/opening-articulated-structures/
