Table of Contents
Fetching ...

Sampling-Based System Identification with Active Exploration for Legged Robot Sim2Real Learning

Nikhil Sobanbabu, Guanqi He, Tairan He, Yuxiang Yang, Guanya Shi

TL;DR

This work tackles the sim-to-real gap in legged robotics by introducing SPI-Active, a two-stage, sampling-based system identification framework that does not require differentiable simulators or ground-truth torques. Stage 1 performs robust inertial and actuator parameter estimation from real trajectories using parallel sampling, while Stage 2 uses active exploration to maximize Fisher Information and refine the parameters through optimized command sequences of a pre-trained multi-behavior policy. The method yields superior open-loop prediction and sim-to-real transfer on Unitree Go2 and G1 platforms across multiple tasks, with 42–63% performance gains over baselines. By combining principled parameter identification with information-driven data collection, SPI-Active provides a scalable, data-efficient pathway to high-fidelity sim-to-real robotics for diverse legged locomotion tasks.

Abstract

Sim-to-real discrepancies hinder learning-based policies from achieving high-precision tasks in the real world. While Domain Randomization (DR) is commonly used to bridge this gap, it often relies on heuristics and can lead to overly conservative policies with degrading performance when not properly tuned. System Identification (Sys-ID) offers a targeted approach, but standard techniques rely on differentiable dynamics and/or direct torque measurement, assumptions that rarely hold for contact-rich legged systems. To this end, we present SPI-Active (Sampling-based Parameter Identification with Active Exploration), a two-stage framework that estimates physical parameters of legged robots to minimize the sim-to-real gap. SPI-Active robustly identifies key physical parameters through massive parallel sampling, minimizing state prediction errors between simulated and real-world trajectories. To further improve the informativeness of collected data, we introduce an active exploration strategy that maximizes the Fisher Information of the collected real-world trajectories via optimizing the input commands of an exploration policy. This targeted exploration leads to accurate identification and better generalization across diverse tasks. Experiments demonstrate that SPI-Active enables precise sim-to-real transfer of learned policies to the real world, outperforming baselines by 42-63% in various locomotion tasks.

Sampling-Based System Identification with Active Exploration for Legged Robot Sim2Real Learning

TL;DR

This work tackles the sim-to-real gap in legged robotics by introducing SPI-Active, a two-stage, sampling-based system identification framework that does not require differentiable simulators or ground-truth torques. Stage 1 performs robust inertial and actuator parameter estimation from real trajectories using parallel sampling, while Stage 2 uses active exploration to maximize Fisher Information and refine the parameters through optimized command sequences of a pre-trained multi-behavior policy. The method yields superior open-loop prediction and sim-to-real transfer on Unitree Go2 and G1 platforms across multiple tasks, with 42–63% performance gains over baselines. By combining principled parameter identification with information-driven data collection, SPI-Active provides a scalable, data-efficient pathway to high-fidelity sim-to-real robotics for diverse legged locomotion tasks.

Abstract

Sim-to-real discrepancies hinder learning-based policies from achieving high-precision tasks in the real world. While Domain Randomization (DR) is commonly used to bridge this gap, it often relies on heuristics and can lead to overly conservative policies with degrading performance when not properly tuned. System Identification (Sys-ID) offers a targeted approach, but standard techniques rely on differentiable dynamics and/or direct torque measurement, assumptions that rarely hold for contact-rich legged systems. To this end, we present SPI-Active (Sampling-based Parameter Identification with Active Exploration), a two-stage framework that estimates physical parameters of legged robots to minimize the sim-to-real gap. SPI-Active robustly identifies key physical parameters through massive parallel sampling, minimizing state prediction errors between simulated and real-world trajectories. To further improve the informativeness of collected data, we introduce an active exploration strategy that maximizes the Fisher Information of the collected real-world trajectories via optimizing the input commands of an exploration policy. This targeted exploration leads to accurate identification and better generalization across diverse tasks. Experiments demonstrate that SPI-Active enables precise sim-to-real transfer of learned policies to the real world, outperforming baselines by 42-63% in various locomotion tasks.

Paper Structure

This paper contains 29 sections, 15 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: SPI-Active enables high-fidelity Sim-to-Real transfer across diverse locomotion tasks. To highlight the precision, all tasks are open-loop tracking without global position feedback. (a) High-Speed Weave Pole Navigation, (b) Precise Forward Jump, (c) Precise Yaw Jump, and (d) Humanoid Precise Velocity Tracking.
  • Figure 2: Overview of SPI-Active. Data Collection: Collect real-world trajectories using RL policies or motion priors. Parameter Identification: Estimate physical parameters via simulation-to-real rollout matching by sampling-based optimization. Active Exploration: Optimize input commands of a multi-behavioral policy to maximize Fisher Information and gather informative data. Downstream Task Training: Use identified parameters to train accurate locomotion controllers.
  • Figure 3: Open-Loop Locomotion Tasks: Forward Jump: Jump forward to a predefined distance of 0.85m. Yaw Jump: Jump and do Yaw Rotation to a predefined yaw angle of 135 degrees. Velocity Tracking: Track a sequence of Open loop 2D twist commands, Attitude Tracking: Track a sequence of roll and pitch commands, Humanoid Velocity Tracking: Track a sequence of 2D twist velocity commands for a humanoid.
  • Figure 4: Comparison between SPI-Active and Vanilla policies in both simulation and real-world execution. SPI-Active yields a closer match to simulation, suggesting improved sim-to-real consistency.
  • Figure 5: Task Performance comparison of SPI-Active vs Vanilla in (i) Forward Jump, (ii) Yaw Jump, (iii) and Velocity Tracking
  • ...and 1 more figures