Learning to Open and Traverse Doors with a Legged Manipulator

Mike Zhang; Yuntao Ma; Takahiro Miki; Marco Hutter

Learning to Open and Traverse Doors with a Legged Manipulator

Mike Zhang, Yuntao Ma, Takahiro Miki, Marco Hutter

TL;DR

The paper tackles autonomous door opening and traversal with a legged manipulator across push/pull doors of varying dynamics. It introduces a teacher-student framework where a privileged-simulation teacher learns via reinforcement learning and a deployment-only student imitates the teacher while estimating hidden door properties. The resulting monolithic policy infers door opening direction online and generalizes to multiple door types, achieving $95.0\%$ real-world success on the ANYmal platform and demonstrating robustness to disturbances. This work advances autonomous spatial access for legged robots and shows effective sim-to-real transfer through comprehensive domain randomization.

Abstract

Using doors is a longstanding challenge in robotics and is of significant practical interest in giving robots greater access to human-centric spaces. The task is challenging due to the need for online adaptation to varying door properties and precise control in manipulating the door panel and navigating through the confined doorway. To address this, we propose a learning-based controller for a legged manipulator to open and traverse through doors. The controller is trained using a teacher-student approach in simulation to learn robust task behaviors as well as estimate crucial door properties during the interaction. Unlike previous works, our approach is a single control policy that can handle both push and pull doors through learned behaviour which infers the opening direction during deployment without prior knowledge. The policy was deployed on the ANYmal legged robot with an arm and achieved a success rate of 95.0% in repeated trials conducted in an experimental setting. Additional experiments validate the policy's effectiveness and robustness to various doors and disturbances. A video overview of the method and experiments can be found at youtu.be/tQDZXN_k5NU.

Learning to Open and Traverse Doors with a Legged Manipulator

TL;DR

real-world success on the ANYmal platform and demonstrating robustness to disturbances. This work advances autonomous spatial access for legged robots and shows effective sim-to-real transfer through comprehensive domain randomization.

Abstract

Paper Structure (29 sections, 4 equations, 14 figures)

This paper contains 29 sections, 4 equations, 14 figures.

Introduction
Related Work
Model-Based Door Opening
Learning-Based Door Opening Control
Teacher-Student Distillation
Method
Door Model
Policy Actions
Sim-to-Real Considerations
Teacher Training
Observations
Rewards
Student Training
Results
A Single Control Policy for All Door Types
...and 14 more sections

Figures (14)

Figure 1: We present a control policy that can open and pass through both pull (top) and push (bottom) doors by estimating the door properties during deployment.
Figure 2: Overview of the training method. Dashed lines indicate the flow of gradients from the losses. The teacher policy is first trained using RL operating on a privileged set of observations. The student policy operates only on the observations available during deployment and is trained to imitate the teacher's behavior while also estimating the privileged information.
Figure 3: Overview of the training environment in simulation. A hook end-effector is used for grasping the door handle.
Figure 4: Additional rewards are used for pull doors to encourage the robot to move around the door panel. When the robot base is in $\mathcal{Z}_1$ it will receive reward $r_{\mathcal{Z}_1}$ and in $\mathcal{Z}_2$ it will receive reward $r_{\mathcal{Z}_1} + r_{\mathcal{Z}_2}$. These rewards are also given based on the end-effector location.
Figure 5: Real-world experiments of the policy deployed on the real robot, traversing through doors of varying swing (left/right) and opening (push/pull) directions. The policy's estimated probability for each door type over time is plotted below the corresponding experiment. The true door type is plotted as a solid line while others are plotted as dashed lines.
...and 9 more figures

Learning to Open and Traverse Doors with a Legged Manipulator

TL;DR

Abstract

Learning to Open and Traverse Doors with a Legged Manipulator

Authors

TL;DR

Abstract

Table of Contents

Figures (14)