Table of Contents
Fetching ...

Bridging the Sim-to-Real Gap for Athletic Loco-Manipulation

Nolan Fey, Gabriel B. Margolis, Martin Peticco, Pulkit Agrawal

TL;DR

The paper addresses the sim-to-real gap in athletic loco-manipulation for legged robots by introducing an Unsupervised Actuator Net (UAN) that learns corrective actuator dynamics from real data without torque sensing. This real-to-sim-to-real calibration is combined with a two-stage WBC training regime: a pre-training phase with random references to establish motion priors, followed by task-specific fine-tuning using a reward that treats the reference trajectory as an exploratory hint. The authors demonstrate the approach on a Unitree B2 with a Z1 Pro arm across three athletic tasks—ball throwing, dumbbell snatch, and sled pulling—achieving notable real-world performance and tighter sim-to-real alignment compared to baselines. The results highlight the practical potential of combining residual actuator modeling with guided but reward-driven policy optimization to enable dynamic, coordinated, athletic behaviors in hardware. Limitations include reliance on task-specific reference trajectories for fine-tuning and focused arm-actuator calibration, suggesting avenues for broader subsystem modeling and automated reference generation in future work.

Abstract

Achieving athletic loco-manipulation on robots requires moving beyond traditional tracking rewards - which simply guide the robot along a reference trajectory - to task rewards that drive truly dynamic, goal-oriented behaviors. Commands such as "throw the ball as far as you can" or "lift the weight as quickly as possible" compel the robot to exhibit the agility and power inherent in athletic performance. However, training solely with task rewards introduces two major challenges: these rewards are prone to exploitation (reward hacking), and the exploration process can lack sufficient direction. To address these issues, we propose a two-stage training pipeline. First, we introduce the Unsupervised Actuator Net (UAN), which leverages real-world data to bridge the sim-to-real gap for complex actuation mechanisms without requiring access to torque sensing. UAN mitigates reward hacking by ensuring that the learned behaviors remain robust and transferable. Second, we use a pre-training and fine-tuning strategy that leverages reference trajectories as initial hints to guide exploration. With these innovations, our robot athlete learns to lift, throw, and drag with remarkable fidelity from simulation to reality.

Bridging the Sim-to-Real Gap for Athletic Loco-Manipulation

TL;DR

The paper addresses the sim-to-real gap in athletic loco-manipulation for legged robots by introducing an Unsupervised Actuator Net (UAN) that learns corrective actuator dynamics from real data without torque sensing. This real-to-sim-to-real calibration is combined with a two-stage WBC training regime: a pre-training phase with random references to establish motion priors, followed by task-specific fine-tuning using a reward that treats the reference trajectory as an exploratory hint. The authors demonstrate the approach on a Unitree B2 with a Z1 Pro arm across three athletic tasks—ball throwing, dumbbell snatch, and sled pulling—achieving notable real-world performance and tighter sim-to-real alignment compared to baselines. The results highlight the practical potential of combining residual actuator modeling with guided but reward-driven policy optimization to enable dynamic, coordinated, athletic behaviors in hardware. Limitations include reliance on task-specific reference trajectories for fine-tuning and focused arm-actuator calibration, suggesting avenues for broader subsystem modeling and automated reference generation in future work.

Abstract

Achieving athletic loco-manipulation on robots requires moving beyond traditional tracking rewards - which simply guide the robot along a reference trajectory - to task rewards that drive truly dynamic, goal-oriented behaviors. Commands such as "throw the ball as far as you can" or "lift the weight as quickly as possible" compel the robot to exhibit the agility and power inherent in athletic performance. However, training solely with task rewards introduces two major challenges: these rewards are prone to exploitation (reward hacking), and the exploration process can lack sufficient direction. To address these issues, we propose a two-stage training pipeline. First, we introduce the Unsupervised Actuator Net (UAN), which leverages real-world data to bridge the sim-to-real gap for complex actuation mechanisms without requiring access to torque sensing. UAN mitigates reward hacking by ensuring that the learned behaviors remain robust and transferable. Second, we use a pre-training and fine-tuning strategy that leverages reference trajectories as initial hints to guide exploration. With these innovations, our robot athlete learns to lift, throw, and drag with remarkable fidelity from simulation to reality.

Paper Structure

This paper contains 38 sections, 12 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Sim-to-real transfer of athletic loco-manipulation. We reduce the sim-to-real gap for a quadruped manipulator by learning a corrective model for the simulated actuator dynamics based on real-world data, formulated as an unsupervised actuator net (UAN). Policies trained with the corrected simulator exhibit improved sim-to-real transfer and push the limits of the robot's physical capabilities in athletic tasks involving whole-body coordination. Videos of the robot's behaviors are available at https://uan.csail.mit.edu/.
  • Figure 2: Unsupervised Actuator Network (UAN) approach for real-to-sim-to-real. Our training pipeline involves three steps: 1) Train a UAN to close the sim-to-real gap for actuators with complex transmission mechanisms by mapping a history of joint position and velocity errors, $\mathbf{e}_t$, to corrective torques, $\delta \boldsymbol{\tau}_t$, 2) Pre-train a WBC using random motion references (base velocity and EE pose), then and fine-tune it on an athletic task reward with the UAN in loop, and 3) Deploy. During the fine-tuning phase, the WBC initially tracks the task-specific reference, and then gradually learns to depart from the reference to maximize task performance.
  • Figure 3: Unitree Z1 Pro arm. This arm's harmonic actuators behave substantially differently from the quasi-direct-drive motors common in small legged robots. This image also shows the reinforcements we designed to ensure that the limit on athleticism comes from actuation rather than the linkage structural integrity.
  • Figure 4: UAN improves simulator accuracy and real throwing performance.UAN (Ours) achieves lower sim-to-real difference in throw distance as compared to standard baselines, resulting in a better real throw distance. For this comparison, we train and test policies with a fixed-base arm, to avoid the risk of the legged base falling during performance-critical ablations.
  • Figure 5: End-to-end fine-tuning from a pre-trained WBC leads to the best task performance. Throwing evaluation metrics across $100$ simulated throws for four policies: Our fine-tuned WBC (Ours) achieves the longest throw distance with lower peak leg power as compared to a throwing policy trained from scratch (No-Pre-Training) or a high-level policy for a frozen WBC (No-E2E). The WBC before finetuning (No-Fine-Tuning) has the lowest peak leg power but throws the ball a much shorter distance.
  • ...and 4 more figures