Table of Contents
Fetching ...

Sim-to-Real Learning for Humanoid Box Loco-Manipulation

Jeremy Dao, Helei Duan, Alan Fern

TL;DR

The paper tackles the challenge of enabling a humanoid robot to perform box pickup and relocation through sim-to-real reinforcement learning. It decomposes the task into five behaviors and trains five dedicated policies using a common LSTM-based framework with phase-aware rewards, achieving robust, full-body coordination. Hardware experiments on the Digit robot demonstrate successful sim-to-real transfer for diverse boxes, marking a first in fully learned loco-manipulation on a real humanoid. The work also analyzes ablations on reward design and action spaces, and outlines pathways for autonomous planning and improved transfer through richer domain randomization and real-world data.

Abstract

In this work we propose a learning-based approach to box loco-manipulation for a humanoid robot. This is a particularly challenging problem due to the need for whole-body coordination in order to lift boxes of varying weight, position, and orientation while maintaining balance. To address this challenge, we present a sim-to-real reinforcement learning approach for training general box pickup and carrying skills for the bipedal robot Digit. Our reward functions are designed to produce the desired interactions with the box while also valuing balance and gait quality. We combine the learned skills into a full system for box loco-manipulation to achieve the task of moving boxes from one table to another with a variety of sizes, weights, and initial configurations. In addition to quantitative simulation results, we demonstrate successful sim-to-real transfer on the humanoid r

Sim-to-Real Learning for Humanoid Box Loco-Manipulation

TL;DR

The paper tackles the challenge of enabling a humanoid robot to perform box pickup and relocation through sim-to-real reinforcement learning. It decomposes the task into five behaviors and trains five dedicated policies using a common LSTM-based framework with phase-aware rewards, achieving robust, full-body coordination. Hardware experiments on the Digit robot demonstrate successful sim-to-real transfer for diverse boxes, marking a first in fully learned loco-manipulation on a real humanoid. The work also analyzes ablations on reward design and action spaces, and outlines pathways for autonomous planning and improved transfer through richer domain randomization and real-world data.

Abstract

In this work we propose a learning-based approach to box loco-manipulation for a humanoid robot. This is a particularly challenging problem due to the need for whole-body coordination in order to lift boxes of varying weight, position, and orientation while maintaining balance. To address this challenge, we present a sim-to-real reinforcement learning approach for training general box pickup and carrying skills for the bipedal robot Digit. Our reward functions are designed to produce the desired interactions with the box while also valuing balance and gait quality. We combine the learned skills into a full system for box loco-manipulation to achieve the task of moving boxes from one table to another with a variety of sizes, weights, and initial configurations. In addition to quantitative simulation results, we demonstrate successful sim-to-real transfer on the humanoid r
Paper Structure (13 sections, 8 equations, 4 figures, 5 tables)

This paper contains 13 sections, 8 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: We learn box loco-manipulation policies in simulation and transfer directly to real hardware. We break the task down into 5 separate policies: Walk, Stand, PickUp, WalkWithBox, and StandWithBox.
  • Figure 2: Allowed transitions between the 5 different policies.
  • Figure 3: Example hand position trajectory for a box pickup. The hands start from the initial robot pose, move to the side of the box (shown in blue), make contact with it, and then bring it to the target location. In this example the target location is directly above the box.
  • Figure 4: Reward curve comparison between different learning setups. "Baseline" is the main learning setup we describe in \ref{['sec:box_pickup']}, "No Hand Trajectory" is the same setup without the hand designed hand trajectory in the reward, and "Absolute Action Space" adds the policy output to a fixed position offset rather than the current motor positions.