Table of Contents
Fetching ...

HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots

Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang, Linxi Fan, Yuke Zhu

TL;DR

Humanoid whole-body control has been held back by mode-specific controllers that hinder transfer between locomotion, manipulation, and tabletop tasks. The authors propose HOVER, a unified neural controller that supports many control modes by distilling a high-capability oracle motion imitator trained on large-scale MoCap data and retargeted to a humanoid. By using a unified command space and a mode/mask-based distillation pipeline, HOVER shares core motor skills across modes and enables seamless transitions without retraining. Empirical results in simulation and on a real Unitree H1 show that HOVER outperforms specialist baselines and a competitive multi-mode RL approach across diverse metrics and demonstrates robust real-world operation and mode-switching capabilities.

Abstract

Humanoid whole-body control requires adapting to diverse tasks such as navigation, loco-manipulation, and tabletop manipulation, each demanding a different mode of control. For example, navigation relies on root velocity tracking, while tabletop manipulation prioritizes upper-body joint angle tracking. Existing approaches typically train individual policies tailored to a specific command space, limiting their transferability across modes. We present the key insight that full-body kinematic motion imitation can serve as a common abstraction for all these tasks and provide general-purpose motor skills for learning multiple modes of whole-body control. Building on this, we propose HOVER (Humanoid Versatile Controller), a multi-mode policy distillation framework that consolidates diverse control modes into a unified policy. HOVER enables seamless transitions between control modes while preserving the distinct advantages of each, offering a robust and scalable solution for humanoid control across a wide range of modes. By eliminating the need for policy retraining for each control mode, our approach improves efficiency and flexibility for future humanoid applications.

HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots

TL;DR

Humanoid whole-body control has been held back by mode-specific controllers that hinder transfer between locomotion, manipulation, and tabletop tasks. The authors propose HOVER, a unified neural controller that supports many control modes by distilling a high-capability oracle motion imitator trained on large-scale MoCap data and retargeted to a humanoid. By using a unified command space and a mode/mask-based distillation pipeline, HOVER shares core motor skills across modes and enables seamless transitions without retraining. Empirical results in simulation and on a real Unitree H1 show that HOVER outperforms specialist baselines and a competitive multi-mode RL approach across diverse metrics and demonstrates robust real-world operation and mode-switching capabilities.

Abstract

Humanoid whole-body control requires adapting to diverse tasks such as navigation, loco-manipulation, and tabletop manipulation, each demanding a different mode of control. For example, navigation relies on root velocity tracking, while tabletop manipulation prioritizes upper-body joint angle tracking. Existing approaches typically train individual policies tailored to a specific command space, limiting their transferability across modes. We present the key insight that full-body kinematic motion imitation can serve as a common abstraction for all these tasks and provide general-purpose motor skills for learning multiple modes of whole-body control. Building on this, we propose HOVER (Humanoid Versatile Controller), a multi-mode policy distillation framework that consolidates diverse control modes into a unified policy. HOVER enables seamless transitions between control modes while preserving the distinct advantages of each, offering a robust and scalable solution for humanoid control across a wide range of modes. By eliminating the need for policy retraining for each control mode, our approach improves efficiency and flexibility for future humanoid applications.

Paper Structure

This paper contains 27 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: HOVER enables versatile humanoid control with a unified multi-mode command space. The versatile multi-mode command space supports kinematic position tracking (blue), local joint angle tracking (yellow), and root tracking (purple). Highlighted boxes indicate active commands being tracked, while the masks (dashed boxes on the right) allow selective activation of different command spaces to accommodate various tasks.
  • Figure 2: Overview of HOVER distillation process. The HOVER policy is distilled from the Oracle policy through proprioception and command masking. The task commands for the student are determined via mode-specific and sparsity-based masks, applied to both upper and lower body motions independently. These masks generate diverse task command modes, refining the student's inputs. The distillation employs DAgger to align the student’s actions with those of the oracle, optimizing through supervised learning on the oracle’s actions.
  • Figure 3: Comparison between prior work specialists (blue) and our generalist policy (green) under corresponding modes. The metrics used are: upper/lower joint error (rad), global/local body position error (mm), root velocity error (m/s), and root rotation error (rad). These metrics evaluate how accurately each policy tracks reference motions and joint configurations across different control modes. The modes being tracked (activated) by each mode are colored blue.
  • Figure 4: We assess the tracking accuracy of two multi-mode control policies—HOVER (green) and Multi-Mode RL (purple)—across eight distinct humanoid control modes. The comparison is visualized across four key performance metrics in the radar charts.
  • Figure 5: Real-World Evaluations on different control modes.
  • ...and 1 more figures