Table of Contents
Fetching ...

Sampling Strategies for Robust Universal Quadrupedal Locomotion Policies

David Rytz, Kim Tien Ly, Ioannis Havoutis

TL;DR

This paper tackles the challenge of universal quadrupedal locomotion across diverse robot morphologies by introducing sampling strategies over robot configurations and joint PD gains to train a single reinforcement learning policy. The authors propose a modular architecture with a dynamics-encoding estimator, an actor-critic policy, and multiple sampling schemes (including a particle-filter-based adaptive approach) combined with domain randomization to encourage cross-robot generalization. Their results show that mass-independent configuration sampling with full PD gain ranges yields robust sim-to-real transfer to both small and large quadrupeds, notably the ANYmal hardware, outperforming several baseline strategies. The work demonstrates that careful parameter sampling, especially for joint gains, is crucial for bridging the sim-to-real gap in universal locomotion and provides a scalable path toward deploying a single policy across multiple quadrupedal platforms. The findings have practical implications for deploying robust, morphology-agnostic locomotion policies in real-world robotics contexts.

Abstract

This work focuses on sampling strategies of configuration variations for generating robust universal locomotion policies for quadrupedal robots. We investigate the effects of sampling physical robot parameters and joint proportional-derivative gains to enable training a single reinforcement learning policy that generalizes to multiple parameter configurations. Three fundamental joint gain sampling strategies are compared: parameter sampling with (1) linear and polynomial function mappings of mass-to-gains, (2) performance-based adaptive filtering, and (3) uniform random sampling. We improve the robustness of the policy by biasing the configurations using nominal priors and reference models. All training was conducted on RaiSim, tested in simulation on a range of diverse quadrupeds, and zero-shot deployed onto hardware using the ANYmal quadruped robot. Compared to multiple baseline implementations, our results demonstrate the need for significant joint controller gains randomization for robust closing of the sim-to-real gap.

Sampling Strategies for Robust Universal Quadrupedal Locomotion Policies

TL;DR

This paper tackles the challenge of universal quadrupedal locomotion across diverse robot morphologies by introducing sampling strategies over robot configurations and joint PD gains to train a single reinforcement learning policy. The authors propose a modular architecture with a dynamics-encoding estimator, an actor-critic policy, and multiple sampling schemes (including a particle-filter-based adaptive approach) combined with domain randomization to encourage cross-robot generalization. Their results show that mass-independent configuration sampling with full PD gain ranges yields robust sim-to-real transfer to both small and large quadrupeds, notably the ANYmal hardware, outperforming several baseline strategies. The work demonstrates that careful parameter sampling, especially for joint gains, is crucial for bridging the sim-to-real gap in universal locomotion and provides a scalable path toward deploying a single policy across multiple quadrupedal platforms. The findings have practical implications for deploying robust, morphology-agnostic locomotion policies in real-world robotics contexts.

Abstract

This work focuses on sampling strategies of configuration variations for generating robust universal locomotion policies for quadrupedal robots. We investigate the effects of sampling physical robot parameters and joint proportional-derivative gains to enable training a single reinforcement learning policy that generalizes to multiple parameter configurations. Three fundamental joint gain sampling strategies are compared: parameter sampling with (1) linear and polynomial function mappings of mass-to-gains, (2) performance-based adaptive filtering, and (3) uniform random sampling. We improve the robustness of the policy by biasing the configurations using nominal priors and reference models. All training was conducted on RaiSim, tested in simulation on a range of diverse quadrupeds, and zero-shot deployed onto hardware using the ANYmal quadruped robot. Compared to multiple baseline implementations, our results demonstrate the need for significant joint controller gains randomization for robust closing of the sim-to-real gap.

Paper Structure

This paper contains 22 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Example run of ANYmal hardware used. Stand still at frames 1 and 6.
  • Figure 2: Extending the frameworks of luoMorALLearningMorphologically2024rytzReferenceFreePlatform2025, we propose the following pipeline. A buffer stores the base states, joint states, and actions, which are passed to the morphology and base linear velocity estimator (red). These estimates are then input to the actor to produce a control action $\mathbf{q}^{des}$ (blue), enabling quadruped locomotion through a joint PD controller. Candidate morphologies and actuator parameters (green) are trained and assessed using our model generation procedure on nominal robot models (grey, bottom right; Section \ref{['ss:robot_generation']}). The final actor is deployed on the ANYmal platform.
  • Figure 3: Success rates for various parameter sampling strategies for different perturbations and dynamics parameters for the A1 (left column) and ANYmal (right column) quadrupeds. During random walking, we measure $\text{SR}^*$ over base perturbation in the horizontal plane, ground friction coefficient, and base mass changes, showing in black the minimum and maximum training range of the disturbance parameters.
  • Figure 4: The success rate $\text{SR}^*$ is measured over a range of PD gain combinations, with yellow indicating good locomotion capabilities and dark blue indicating low to no stable performance. The red-white dot indicates the nominal PD gains as per the original implementation. The tests were run in RaiSim for the A1 (top two rows) and ANYmal (bottom two rows). The different $\mathbf{c}_i$ sampling strategies go from left to right as Uniform ($\text{SR}=1$) or particle filter (with SR adaptive), GenLoco ($\text{SR}=1$ or adaptive), MorAL ($\text{SR}=1$ or adaptive), and URMA ($\text{SR}=1$ or adaptive). We removed ManyQuadrupeds as it performed worse than GenLoco. In red, we show the nominal gains for the respective sampling type and robot.