Watch Less, Feel More: Sim-to-Real RL for Generalizable Articulated Object Manipulation via Motion Adaptation and Impedance Control
Tan-Dzung Do, Nandiraju Gireesh, Jilong Wang, He Wang
TL;DR
This work tackles the challenge of generalizable articulated object manipulation with zero-shot sim-to-real transfer by replacing vision-centric action inputs with history-based observations and a learnable variable impedance controller. It introduces a joint RL framework featuring a Privileged Observation Encoder φ and an Adaptation Module σ that learn latent object dynamics from history, and couples this with stage-aware rewards and domain randomization to train end-to-end manipulation without heuristic planning. A key contribution is the integration of a learnable Cartesian impedance controller, enabling smooth, compliant motions that adapt to object motion and contact forces, improving real-world transfer. The approach achieves high real-world success on unseen objects (OpenDoor+ and OpenDrawer+ tasks) and demonstrates robust generalization and smooth motion suitable for practical deployment, suggesting a viable path toward less vision-dependent, more tactilely aware robotic manipulation.
Abstract
Articulated object manipulation poses a unique challenge compared to rigid object manipulation as the object itself represents a dynamic environment. In this work, we present a novel RL-based pipeline equipped with variable impedance control and motion adaptation leveraging observation history for generalizable articulated object manipulation, focusing on smooth and dexterous motion during zero-shot sim-to-real transfer. To mitigate the sim-to-real gap, our pipeline diminishes reliance on vision by not leveraging the vision data feature (RGBD/pointcloud) directly as policy input but rather extracting useful low-dimensional data first via off-the-shelf modules. Additionally, we experience less sim-to-real gap by inferring object motion and its intrinsic properties via observation history as well as utilizing impedance control both in the simulation and in the real world. Furthermore, we develop a well-designed training setting with great randomization and a specialized reward system (task-aware and motion-aware) that enables multi-staged, end-to-end manipulation without heuristic motion planning. To the best of our knowledge, our policy is the first to report 84\% success rate in the real world via extensive experiments with various unseen objects.
