Learning and Deploying Robust Locomotion Policies with Minimal Dynamics Randomization

Luigi Campanaro; Siddhant Gangapurwala; Wolfgang Merkt; Ioannis Havoutis

Learning and Deploying Robust Locomotion Policies with Minimal Dynamics Randomization

Luigi Campanaro, Siddhant Gangapurwala, Wolfgang Merkt, Ioannis Havoutis

TL;DR

The paper presents Extended Random Force Injection (ERFI) as a minimal, parameter-efficient alternative to dynamics randomization for training robust quadrupedal locomotion policies in simulation. By combining random perturbations to joint torques with episodic actuation offsets (ERFI-C) or pure RFI variants (including ERFI-50), the approach captures both local and global dynamics variations, enabling effective sim-to-real transfer without extensive system identification. Empirical results show that ERFI-based policies outperform standard baselines and broad-domain randomization, with up to ~53% improved robustness to mass variations and ~61% when a payload arm is added, validated on ANYmal C and Unitree A1 across flat and uneven terrains. The work demonstrates practical hardware deployment and suggests ERFI as a lightweight, effective alternative to actuator networks and heavy randomization for real-world legged locomotion.

Abstract

Training deep reinforcement learning (DRL) locomotion policies often require massive amounts of data to converge to the desired behaviour. In this regard, simulators provide a cheap and abundant source. For successful sim-to-real transfer, exhaustively engineered approaches such as system identification, dynamics randomization, and domain adaptation are generally employed. As an alternative, we investigate a simple strategy of random force injection (RFI) to perturb system dynamics during training. We show that the application of random forces enables us to emulate dynamics randomization. This allows us to obtain locomotion policies that are robust to variations in system dynamics. We further extend RFI, referred to as extended random force injection (ERFI), by introducing an episodic actuation offset. We demonstrate that ERFI provides additional robustness for variations in system mass offering on average a 53% improved performance over RFI. We also show that ERFI is sufficient to perform a successful sim-to-real transfer on two different quadrupedal platforms, ANYmal C and Unitree A1, even for perceptive locomotion over uneven terrain in outdoor environments.

Learning and Deploying Robust Locomotion Policies with Minimal Dynamics Randomization

TL;DR

Abstract

Paper Structure (16 sections, 8 equations, 5 figures)

This paper contains 16 sections, 8 equations, 5 figures.

Introduction
Related Works
Preliminaries
System Model
Impedance Control
Rigid Body Dynamics Model
Extended Random Force Injection
Why does ERFI work?
How does RFI model delays?
How does RAO model mass and kinematic variations?
Problem Definition
Perceptive Quadrupedal Locomotion
Blind Quadrupedal Locomotion
Experimental Setup
Results
...and 1 more sections

Figures (5)

Figure 1: Deployment of the perceptive and blind locomotion policies on the ANYmal C and Unitree A1 quadrupedal platforms trained using our proposed ERFI-50 strategy without requiring actuation modeling or explicit randomization of dynamics or actuation properties.
Figure 2: The magnitudes of $\tau^{lim}_{r_j}$ and $\tau^{lim}_{o_j}$ affect the dynamics of the system.
Figure 3: (Left) Examples of stairs with varying step-height and step-depth used for evaluation. (Center) ANYmal C walking on stairs with an unmodeled Kinova manipulator. (Right) ANYmal C walking on rocky terrain during tests.
Figure 4: This figure shows some of the experiments on Unitree A1 adopting ERFI, also part of our accompanying website https://sites.google.com/view/erfi-video. a) Walking on wet terrain and recovering from slipping, b) resisting to external forces, c) withstanding impulsive forces, d) walking on soft terrain, e) walking with an unknown 5 Kg payload, f) walking on wooden cylinders, g) traversing a ramp, and h) adapting to a $K_p^{RH KFE}$ equal to a third of the original value.
Figure 5: \ref{['fig:mass_success_rate_no_arm', 'fig:extForceMagnitude_success_rate_no_arm', 'fig:friction_success_rate_no_arm']} show how , , , and ActNetRand resist to variations of the base mass, to external forces, or to different frictions. In \ref{['fig:mass_success_rate_arm', 'fig:extForceMagnitude_success_rate_arm', 'fig:friction_success_rate_arm']} the same experiments are replicated with a Kinova manipular on top of the robot. In \ref{['fig:mass_success_rate_rocky', 'fig:extForceMagnitude_success_rate_rocky']} we investigated the effects of the perturbations also on the rocky terrain in \ref{['fig:test_environments']}. While, in \ref{['fig:ERFI_different_magnitudes']} we studied how different $\tau^{lim}_{o_j}$ and $\tau^{lim}_{r_j}$ affected the robustness of the controller.

Learning and Deploying Robust Locomotion Policies with Minimal Dynamics Randomization

TL;DR

Abstract

Learning and Deploying Robust Locomotion Policies with Minimal Dynamics Randomization

Authors

TL;DR

Abstract

Table of Contents

Figures (5)