Table of Contents
Fetching ...

OmniClone: Engineering a Robust, All-Rounder Whole-Body Humanoid Teleoperation System

Yixuan Li, Le Ma, Yutang Lin, Yushi Du, Mengya Liu, Kaizhe Hu, Jieming Cui, Yixin Zhu, Wei Liang, Baoxiong Jia, Siyuan Huang

Abstract

Whole-body humanoid teleoperation enables humans to remotely control humanoid robots, serving as both a real-time operational tool and a scalable engine for collecting demonstrations for autonomous learning. Despite recent advances, existing systems are validated using aggregate metrics that conflate distinct motion regimes, masking critical failure modes. This lack of diagnostic granularity, compounded by tightly coupled and labor-intensive system configurations, hinders robust real-world deployment. A key open challenge is building a teleoperation system that is simultaneously robust, versatile, and affordable for practical use. Here we present OmniClone, a whole-body humanoid teleoperation system that achieves high-fidelity, multi-skill control on a single consumer GPU with modest data requirements. Central to our approach is OmniBench, a diagnostic benchmark that evaluates policies across stratified motion categories and difficulty levels on unseen motions, exposing the narrow specialization of prior systems. Guided by these diagnostics, we identify an optimized training data recipe and integrate system-level improvements: subject-agnostic retargeting and robust communication, that collectively reduce Mean Per-Joint Position Error (MPJPE) by over 66% while requiring orders-of-magnitude fewer computational resources than comparable methods. Crucially, OmniClone is control-source-agnostic: a single unified policy supports real-time teleoperation, generated motion playback, and Vision-Language-Action (VLA) models, while generalizing across operators of vastly different body proportions. By uniting diagnostic evaluation with practical engineering, OmniClone provides an accessible foundation for scalable humanoid teleoperation and autonomous learning.

OmniClone: Engineering a Robust, All-Rounder Whole-Body Humanoid Teleoperation System

Abstract

Whole-body humanoid teleoperation enables humans to remotely control humanoid robots, serving as both a real-time operational tool and a scalable engine for collecting demonstrations for autonomous learning. Despite recent advances, existing systems are validated using aggregate metrics that conflate distinct motion regimes, masking critical failure modes. This lack of diagnostic granularity, compounded by tightly coupled and labor-intensive system configurations, hinders robust real-world deployment. A key open challenge is building a teleoperation system that is simultaneously robust, versatile, and affordable for practical use. Here we present OmniClone, a whole-body humanoid teleoperation system that achieves high-fidelity, multi-skill control on a single consumer GPU with modest data requirements. Central to our approach is OmniBench, a diagnostic benchmark that evaluates policies across stratified motion categories and difficulty levels on unseen motions, exposing the narrow specialization of prior systems. Guided by these diagnostics, we identify an optimized training data recipe and integrate system-level improvements: subject-agnostic retargeting and robust communication, that collectively reduce Mean Per-Joint Position Error (MPJPE) by over 66% while requiring orders-of-magnitude fewer computational resources than comparable methods. Crucially, OmniClone is control-source-agnostic: a single unified policy supports real-time teleoperation, generated motion playback, and Vision-Language-Action (VLA) models, while generalizing across operators of vastly different body proportions. By uniting diagnostic evaluation with practical engineering, OmniClone provides an accessible foundation for scalable humanoid teleoperation and autonomous learning.
Paper Structure (27 sections, 4 equations, 9 figures, 5 tables)

This paper contains 27 sections, 4 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: OmniClone achieves well-balanced, high-fidelity whole-body tracking across all mpjpe dimensions on OmniBench while enabling versatile real-world deployment. Center: the radar map compares mpjpe (mm) across 18 stratified evaluation categories, showing that OmniClone consistently outperforms sota baselines GMT and Twist2. Surrounding panels illustrate real-world teleoperation with a unified policy, spanning dynamic whole-body motions (sprint, long jump) and stable, long-horizon dexterous loco-manipulation (transport, deformable object handling).
  • Figure 2: sr comparison on OmniBench reveals narrow skill specialization of existing methods. While GMT and Twist2 suffer notable drops in lower-body agility tasks (deep squatting, low-altitude loco-manipulation) and high-dynamic maneuvers (jumping), OmniClone maintains near-perfect sr across all 18 stratified categories. Complementing the mpjpe radar map in \ref{['fig:teaser']}, these results confirm that prior systems' aggregate scores mask significant per-category failures.
  • Figure 3: Overview of the OmniClone framework, comprising model training (top) and system infrastructure (bottom). Top: a teacher--student distillation pipeline trained on an optimized data recipe, whose composition is iteratively refined via OmniBench diagnostics. Bottom: the deployment infrastructure features subject-agnostic retargeting and robust wireless communication, yielding a control-source-agnostic interface compatible with real-time teleoperation, generative motion models, and downstream vla planners such as GR00T N1.6.
  • Figure 4: Ablations on training data recipes show how data composition steers policy behavior.OmniCloneDynamic excels at agile motions but collapses on manipulation and squatting; OmniCloneStable recovers stability at the cost of dynamic agility; OmniCloneBalance improves breadth across both regimes. The final OmniClone recipe, obtained by filtering biased sequences from OmniCloneBalance, achieves the best overall trade-off in both (a) sr and (b) mpjpe.
  • Figure 5: OmniClone generalizes reliably across teleoperators of vastly different heights. Six participants ($1.47$ m--$1.94$ m) each perform a composite loco-manipulation task (walk, stabilize, pick-and-place). Despite a $47$ cm height span, the system maintains consistent stability and task success throughout, with all novice operators completing the task within $5$--$7$ practice trials.
  • ...and 4 more figures