Humanoid Hanoi: Investigating Shared Whole-Body Control for Skill-Based Box Rearrangement

Minku Kim; Kuan-Chia Chen; Aayam Shrestha; Li Fuxin; Stefan Lee; Alan Fern

Humanoid Hanoi: Investigating Shared Whole-Body Control for Skill-Based Box Rearrangement

Minku Kim, Kuan-Chia Chen, Aayam Shrestha, Li Fuxin, Stefan Lee, Alan Fern

TL;DR

This work tackles the challenge of long-horizon humanoid box rearrangement by orchestrating reusable loco-manipulation skills through a single shared whole-body controller (WBC). It demonstrates that naive reuse of a pretrained WBC can degrade robustness, and proposes rollout-based data aggregation to expand the WBC's coverage without altering the high-level skill interfaces. The Humanoid Hanoi benchmark is introduced to quantify long-horizon performance, with simulations and hardware experiments (Digit V3) showing improved stability and success over non-shared baselines. The findings support a scalable, task-agnostic control framework for humanoids, highlighting practical gains in robustness and offering concrete directions for future improvements in perception, placement stabilization, and real-world reliability.

Abstract

We investigate a skill-based framework for humanoid box rearrangement that enables long-horizon execution by sequencing reusable skills at the task level. In our architecture, all skills execute through a shared, task-agnostic whole-body controller (WBC), providing a consistent closed-loop interface for skill composition, in contrast to non-shared designs that use separate low-level controllers per skill. We find that naively reusing the same pretrained WBC can reduce robustness over long horizons, as new skills and their compositions induce shifted state and command distributions. We address this with a simple data aggregation procedure that augments shared-WBC training with rollouts from closed-loop skill execution under domain randomization. To evaluate the approach, we introduce \emph{Humanoid Hanoi}, a long-horizon Tower-of-Hanoi box rearrangement benchmark, and report results in simulation and on the Digit V3 humanoid robot, demonstrating fully autonomous rearrangement over extended horizons and quantifying the benefits of the shared-WBC approach over non-shared baselines.

Humanoid Hanoi: Investigating Shared Whole-Body Control for Skill-Based Box Rearrangement

TL;DR

Abstract

Paper Structure (19 sections, 2 equations, 6 figures, 7 tables)

This paper contains 19 sections, 2 equations, 6 figures, 7 tables.

Introduction
Related Work
Learning-Based Humanoid Loco-Manipulation
Humanoid Box Loco-Manipulation
System Overview
Shared Whole-Body Controller
Skill Learning
Shared WBC Coverage Expansion
Humanoid Hanoi Benchmark
Simulation Experiments
Approaches Compared
Individual Skill Evaluation
Humanoid Hanoi Evaluation
Humanoid Hanoi Analysis of Failure Modes
Hardware Experiments
...and 4 more sections

Figures (6)

Figure 1: Humanoid Hanoi, a problem instance from the Tower-of-Hanoi box rearrangement benchmark. The robot moves boxes between three towers (T1–T3) while respecting stacking constraints. The panels illustrate representative stages of a successful hardware execution (5+ min), with the corresponding symbolic state shown on the right. This benchmark stresses long-horizon autonomy by requiring repeated skill chaining with precise placement under constraints.
Figure 2: Independently trained high-level skills generate task-level commands that are executed through a shared, task-agnostic whole-body controller (WBC). The WBC produces joint-level PD targets that are tracked by a low-level PD controller on the robot. Closed-loop rollouts from composed execution are aggregated to retrain the shared controller, improving robustness while preserving a unified control interface. The base action ${a}_t^{\text{base}}$ specifies base locomotion commands, including a stand bit, planar velocity targets, and yaw rate.
Figure 3: Long-horizon Humanoid Hanoi execution. Sequential snapshots show a complete Tower-of-Hanoi-style box rearrangement episode. Transparent overlays visualize the executed robot trajectory over time, and T1, T2, and T3 denote the target tower locations. The benchmark is divided into seven moves, each consisting of a sequence of GoTo, Pickup, GoTo with Box, and Place skills.
Figure 4: Cumulative task success rates for the Humanoid Hanoi benchmark. Each move consists of four skills: GoTo, Pickup, GoTo with Box, and Place, shown as small markers, with large markers indicating move completion. Drops between consecutive points indicate failures in the subsequent skill (e.g., a drop from Pickup to GoTo with Box indicates failures during the GoTo with Box execution).
Figure 5: Three Humanoid Hanoi configurations (C1--C3) where T1, T2, and T3 denote the tower locations defining the stack positions.
...and 1 more figures

Humanoid Hanoi: Investigating Shared Whole-Body Control for Skill-Based Box Rearrangement

TL;DR

Abstract

Humanoid Hanoi: Investigating Shared Whole-Body Control for Skill-Based Box Rearrangement

Authors

TL;DR

Abstract

Table of Contents

Figures (6)