Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion

Tairan He; Chong Zhang; Wenli Xiao; Guanqi He; Changliu Liu; Guanya Shi

Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion

Tairan He, Chong Zhang, Wenli Xiao, Guanqi He, Changliu Liu, Guanya Shi

TL;DR

ABS presents a dual-policy framework for agile yet safe legged locomotion, combining a learning-based agile policy with a policy-conditioned reach-avoid value network and a recovery policy guided by RA gradients. The system uses a low-dimensional exteroceptive ray representation and a ray-prediction network to enable real-time collision avoidance with onboard sensing. Training occurs entirely in simulation with domain randomization and curriculum, enabling direct deployment on a Unitree Go1 with onboard computation. Real-world experiments demonstrate high speeds and strong safety across indoor and outdoor environments, and extensive analyses reveal the design choices that balance agility, safety, perception, and sim-to-real transfer. The work advances safe, high-speed locomotion by integrating model-free learning with control-theoretic safety principles in a closed-loop, policy-conditioned framework.

Abstract

Legged robots navigating cluttered environments must be jointly agile for efficient task execution and safe to avoid collisions with obstacles or humans. Existing studies either develop conservative controllers (< 1.0 m/s) to ensure safety, or focus on agility without considering potentially fatal collisions. This paper introduces Agile But Safe (ABS), a learning-based control framework that enables agile and collision-free locomotion for quadrupedal robots. ABS involves an agile policy to execute agile motor skills amidst obstacles and a recovery policy to prevent failures, collaboratively achieving high-speed and collision-free navigation. The policy switch in ABS is governed by a learned control-theoretic reach-avoid value network, which also guides the recovery policy as an objective function, thereby safeguarding the robot in a closed loop. The training process involves the learning of the agile policy, the reach-avoid value network, the recovery policy, and an exteroception representation network, all in simulation. These trained modules can be directly deployed in the real world with onboard sensing and computation, leading to high-speed and collision-free navigation in confined indoor and outdoor spaces with both static and dynamic obstacles.

Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion

TL;DR

Abstract

Paper Structure (65 sections, 27 equations, 12 figures, 11 tables)

This paper contains 65 sections, 27 equations, 12 figures, 11 tables.

Introduction
Related Works
Agile Legged Locomotion
Legged Collision Avoidance
Safe Reinforcement Learning
Reach-Avoid Problems and Hamilton-Jacobi Analysis
Overview and Preliminaries
Nomenclature
Problem Formulation
Dynamics
Goal and Policy
Failure Set, Target Set and Reach-Avoid Set
Reach-Avoid Value and Time-Discounted Reach-Avoid Bellman Equation
System Structure
Learning Agile Policy
...and 50 more sections

Figures (12)

Figure 1: Our proposed framework ABS demonstrates agile and collision-free locomotion capabilities, where the robot, with fully onboard computation and sensing, can safely navigate through cluttered environments and rapidly react to diverse and dynamic obstacles, both indoors and outdoors. ABS involves a dual-policy setup: green lines at the bottom indicate the agile policy taking control, and red lines indicate the recovery policy in operation. The agile policy enables the robot to run fast amidst obstacles, and the recovery policy saves the robot from risky cases where the agile policy might fail. Subfigures: (a) The robot dodges a swinging human leg. (b) The agile policy enables the robot to run at a peak speed of $3.1$ m/s. (c) The robot dodges a moving stroller during high-speed locomotion. (d) The robot dodges a moving human in snowy terrain. (e) The robot safely navigates in a hall with both static and dynamic obstacles, with an average speed of $2.1$ m/s and a peak speed of $2.9$ m/s. (f) The robot avoids obstacles and moving humans in a dim corridor, with an average speed of $1.5$ m/s and a peak speed of $2.5$ m/s. (g) The robot, running outdoors at an average speed of $2.3$ m/s and a peak speed of $3.0$ m/s, avoids both moving and static trash bins and climbs up a grassy slope. Videos: see the website.
Figure 2: Overview of ABS: (a) There are four trained modules within the ABS framework: 1) Agile Policy (introduced in \ref{['sec:agilepolicy']}) is trained to achieve the maximum agility amidst obstacles; 2) Reach-Avoid Value Network (introduced in \ref{['sec:reachavoidvalues']}) is trained to predict the RA values conditioned on the agile policy as safety indicators; 3) Recovery Policy (introduced in \ref{['sec:recoverypolicy']}) is trained to track desired twist commands (2D linear velocity $v_x^c, v_y^c$ and yaw angular velocity $\omega_z^c$) that lower the RA values; 4) Ray-Prediction Network (introduced in \ref{['sec:perception']}) is trained to predict ray distances as the policies' exteroceptive inputs given depth images. (b) Illustration of the ABS deployment architecture. The dual policy setup switches between the agile policy and the recovery policy based on the estimated $\hat{V}$ from the RA value network: 1) if $\hat{V} < V_{\text{threshold}}$, the agile policy is activated to navigate amidst obstacles; 2) if $\hat{V} \geq V_{\text{threshold}}$, the recovery policy is activated to track twist commands that lower the RA values via constrained optimization.
Figure 3: Example training environments. The magenta points indicate the goals, and the bluegreen lines indicate the exteroceptive ray observations. Terrains from left to right: flat, low stumbling blocks, and rough.
Figure 4: Visualization of $\hat{V}$ with different linear velocities and 2D positions relative to the $3$ fixed obstacles. The angular velocities are set to zero, and the relative goal commands are set to $5$ m ahead of the robot. The grey circles represent the obstacles, and the colors represent the values of $\hat{V}$ at corresponding 2D positions. The first row presents the RA values trained with the softened failure function $\zeta$, while the second row uses the raw one in \ref{['eq:failurefunc']}. Without softening $\zeta$ to approach the Lipschitz continuity, the value estimation fails to indicate collisions on the sides of obstacles and has local minima in front of the obstacles, compromising safety.
Figure 5: Various obstacles used for ray-prediction data collection.
...and 7 more figures

Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion

TL;DR

Abstract

Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion

Authors

TL;DR

Abstract

Table of Contents

Figures (12)