WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts

Chong Zhang; Wenli Xiao; Tairan He; Guanya Shi

WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts

Chong Zhang, Wenli Xiao, Tairan He, Guanya Shi

TL;DR

WoCoCo introduces a unified RL framework that learns whole‑body humanoid control under sequential contact plans by decomposing tasks into contact stages and employing a compact, task‑agnostic reward design (dense contact, stage progression, and curiosity). A sim‑to‑real curriculum with domain randomization and regularization enables end‑to‑end policies to transfer from simulation to real humanoids, demonstrated on four dynamic tasks and a 22‑DoF dinosaur loco‑manipulation. The approach reduces task‑specific tuning and avoids heavy motion priors, while maintaining robustness to perturbations and unseen environments. Collectively, WoCoCo showcases broad applicability of sequential‑contact RL for versatile, real‑world locomotion and manipulation across embodiments.

Abstract

Humanoid activities involving sequential contacts are crucial for complex robotic interactions and operations in the real world and are traditionally solved by model-based motion planning, which is time-consuming and often relies on simplified dynamics models. Although model-free reinforcement learning (RL) has become a powerful tool for versatile and robust whole-body humanoid control, it still requires tedious task-specific tuning and state machine design and suffers from long-horizon exploration issues in tasks involving contact sequences. In this work, we propose WoCoCo (Whole-Body Control with Sequential Contacts), a unified framework to learn whole-body humanoid control with sequential contacts by naturally decomposing the tasks into separate contact stages. Such decomposition facilitates simple and general policy learning pipelines through task-agnostic reward and sim-to-real designs, requiring only one or two task-related terms to be specified for each task. We demonstrated that end-to-end RL-based controllers trained with WoCoCo enable four challenging whole-body humanoid tasks involving diverse contact sequences in the real world without any motion priors: 1) versatile parkour jumping, 2) box loco-manipulation, 3) dynamic clap-and-tap dancing, and 4) cliffside climbing. We further show that WoCoCo is a general framework beyond humanoid by applying it in 22-DoF dinosaur robot loco-manipulation tasks.

WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts

TL;DR

Abstract

Paper Structure (42 sections, 11 equations, 11 figures, 3 tables)

This paper contains 42 sections, 11 equations, 11 figures, 3 tables.

Introduction
Overview: Learning with Sequential Contacts and Task Decomposition
WoCoCo Rewards and Sim-to-Real Transfer
WoCoCo Rewards
Sim-to-Real Transfer
Case Studies
Case I: Versatile Parkour Jumping
Case II: Anywhere-to-Anywhere Box Loco-Manipulation
Case III: Dynamic Clap-and-Tap Dancing
Case IV: Bidirectional Cliffside Climbing
Beyond Humanoid: Dinosaur Loco-Manipulation
Analyses and Ablations
Limitation and Future Works
An Illustrative Example of Symbols in Contact Rewards
Ablation Baseline Behaviors
...and 27 more sections

Figures (11)

Figure 1: An overview of WoCoCo and tasks. (A) We decompose the task into separate contact stages, where each contact stage is defined by the contact goal and the task goal. (B)-(E): We applied our WoCoCo framework to various challenging tasks. Contact goals are visualized in blue, which involve some or all of the end effectors (i.e., hands and feet).
Figure 2: Learned versatile jumping motions in simulation and the real world. Upper Row: The humanoid performs continuous jumps with varying foot contact sequences and upper body posture goals, demonstrating robustness against unseen gravels. Lower Row: We transfer the policy to the real world, testing jumps with double-foot contacts at different heights and a "hug" posture.
Figure 3: Learned whole-body box loco-manipulation behaviors in the real world.
Figure 4: Learned dancing motions in simulation and the real-world. Black bounding boxes indicate the foot contact goals and the hand task goals.
Figure 5: Learned cliffside climbing behavior in simulation and the real-world. The humanoid exhibited resilience against perturbations and compliance during contact with unseen gravels.
...and 6 more figures

WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts

TL;DR

Abstract

WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts

Authors

TL;DR

Abstract

Table of Contents

Figures (11)