Bridging the Sim-to-Real Gap for Athletic Loco-Manipulation
Nolan Fey, Gabriel B. Margolis, Martin Peticco, Pulkit Agrawal
TL;DR
The paper addresses the sim-to-real gap in athletic loco-manipulation for legged robots by introducing an Unsupervised Actuator Net (UAN) that learns corrective actuator dynamics from real data without torque sensing. This real-to-sim-to-real calibration is combined with a two-stage WBC training regime: a pre-training phase with random references to establish motion priors, followed by task-specific fine-tuning using a reward that treats the reference trajectory as an exploratory hint. The authors demonstrate the approach on a Unitree B2 with a Z1 Pro arm across three athletic tasks—ball throwing, dumbbell snatch, and sled pulling—achieving notable real-world performance and tighter sim-to-real alignment compared to baselines. The results highlight the practical potential of combining residual actuator modeling with guided but reward-driven policy optimization to enable dynamic, coordinated, athletic behaviors in hardware. Limitations include reliance on task-specific reference trajectories for fine-tuning and focused arm-actuator calibration, suggesting avenues for broader subsystem modeling and automated reference generation in future work.
Abstract
Achieving athletic loco-manipulation on robots requires moving beyond traditional tracking rewards - which simply guide the robot along a reference trajectory - to task rewards that drive truly dynamic, goal-oriented behaviors. Commands such as "throw the ball as far as you can" or "lift the weight as quickly as possible" compel the robot to exhibit the agility and power inherent in athletic performance. However, training solely with task rewards introduces two major challenges: these rewards are prone to exploitation (reward hacking), and the exploration process can lack sufficient direction. To address these issues, we propose a two-stage training pipeline. First, we introduce the Unsupervised Actuator Net (UAN), which leverages real-world data to bridge the sim-to-real gap for complex actuation mechanisms without requiring access to torque sensing. UAN mitigates reward hacking by ensuring that the learned behaviors remain robust and transferable. Second, we use a pre-training and fine-tuning strategy that leverages reference trajectories as initial hints to guide exploration. With these innovations, our robot athlete learns to lift, throw, and drag with remarkable fidelity from simulation to reality.
