Table of Contents
Fetching ...

Watch Less, Feel More: Sim-to-Real RL for Generalizable Articulated Object Manipulation via Motion Adaptation and Impedance Control

Tan-Dzung Do, Nandiraju Gireesh, Jilong Wang, He Wang

TL;DR

This work tackles the challenge of generalizable articulated object manipulation with zero-shot sim-to-real transfer by replacing vision-centric action inputs with history-based observations and a learnable variable impedance controller. It introduces a joint RL framework featuring a Privileged Observation Encoder φ and an Adaptation Module σ that learn latent object dynamics from history, and couples this with stage-aware rewards and domain randomization to train end-to-end manipulation without heuristic planning. A key contribution is the integration of a learnable Cartesian impedance controller, enabling smooth, compliant motions that adapt to object motion and contact forces, improving real-world transfer. The approach achieves high real-world success on unseen objects (OpenDoor+ and OpenDrawer+ tasks) and demonstrates robust generalization and smooth motion suitable for practical deployment, suggesting a viable path toward less vision-dependent, more tactilely aware robotic manipulation.

Abstract

Articulated object manipulation poses a unique challenge compared to rigid object manipulation as the object itself represents a dynamic environment. In this work, we present a novel RL-based pipeline equipped with variable impedance control and motion adaptation leveraging observation history for generalizable articulated object manipulation, focusing on smooth and dexterous motion during zero-shot sim-to-real transfer. To mitigate the sim-to-real gap, our pipeline diminishes reliance on vision by not leveraging the vision data feature (RGBD/pointcloud) directly as policy input but rather extracting useful low-dimensional data first via off-the-shelf modules. Additionally, we experience less sim-to-real gap by inferring object motion and its intrinsic properties via observation history as well as utilizing impedance control both in the simulation and in the real world. Furthermore, we develop a well-designed training setting with great randomization and a specialized reward system (task-aware and motion-aware) that enables multi-staged, end-to-end manipulation without heuristic motion planning. To the best of our knowledge, our policy is the first to report 84\% success rate in the real world via extensive experiments with various unseen objects.

Watch Less, Feel More: Sim-to-Real RL for Generalizable Articulated Object Manipulation via Motion Adaptation and Impedance Control

TL;DR

This work tackles the challenge of generalizable articulated object manipulation with zero-shot sim-to-real transfer by replacing vision-centric action inputs with history-based observations and a learnable variable impedance controller. It introduces a joint RL framework featuring a Privileged Observation Encoder φ and an Adaptation Module σ that learn latent object dynamics from history, and couples this with stage-aware rewards and domain randomization to train end-to-end manipulation without heuristic planning. A key contribution is the integration of a learnable Cartesian impedance controller, enabling smooth, compliant motions that adapt to object motion and contact forces, improving real-world transfer. The approach achieves high real-world success on unseen objects (OpenDoor+ and OpenDrawer+ tasks) and demonstrates robust generalization and smooth motion suitable for practical deployment, suggesting a viable path toward less vision-dependent, more tactilely aware robotic manipulation.

Abstract

Articulated object manipulation poses a unique challenge compared to rigid object manipulation as the object itself represents a dynamic environment. In this work, we present a novel RL-based pipeline equipped with variable impedance control and motion adaptation leveraging observation history for generalizable articulated object manipulation, focusing on smooth and dexterous motion during zero-shot sim-to-real transfer. To mitigate the sim-to-real gap, our pipeline diminishes reliance on vision by not leveraging the vision data feature (RGBD/pointcloud) directly as policy input but rather extracting useful low-dimensional data first via off-the-shelf modules. Additionally, we experience less sim-to-real gap by inferring object motion and its intrinsic properties via observation history as well as utilizing impedance control both in the simulation and in the real world. Furthermore, we develop a well-designed training setting with great randomization and a specialized reward system (task-aware and motion-aware) that enables multi-staged, end-to-end manipulation without heuristic motion planning. To the best of our knowledge, our policy is the first to report 84\% success rate in the real world via extensive experiments with various unseen objects.

Paper Structure

This paper contains 16 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: In the simulation, we train a Privileged Observation Encoder $\phi$ to extract the latent representation of privileged information ${z}^t$ and simultaneously train an Adaptation Module $\sigma$ to infer this representation $\tilde{z}^t$ from $H=10$ previous $(o^t, a^{t-1})$ pairs. The latent representation ${z}^t$ is then concatenated with desired grasping pose $p^t$, robot proprioception $q^t$, robot-object distance $\delta^t$, and categorical object parameters to form policy input. In the real world, we rollout trained policy with Adaptation Module $\sigma$ in an end-to-end manner, executing reaching, grasping, and manipulating. We leverage one RGBD image captured at the first frame to extract the desired grasping pose via off-the-shelf vision modules.
  • Figure 2: We extensively evaluate our policy in the real world with a wide range of unseen objects, varied in appearance, size, hinge orientation, and hinge stiffness. We demonstrate our performance in a reasonable workspace, with objects facing front or tilting slightly around the $z$ axis.
  • Figure 3: Our learned controller gain actively adapts to the manipulation stages even without a direct gain reward: stiffer while reaching, softer while opening.