Bridging Adaptivity and Safety: Learning Agile Collision-Free Locomotion Across Varied Physics

Yichao Zhong; Chong Zhang; Tairan He; Guanya Shi

Bridging Adaptivity and Safety: Learning Agile Collision-Free Locomotion Across Varied Physics

Yichao Zhong, Chong Zhang, Tairan He, Guanya Shi

TL;DR

Real-world legged locomotion requires balancing adaptability to unknown, time-varying physics with safety and agility. The authors propose BAS, which extends ABS by introducing a policy-conditioned physics-parameter estimator, a learning-based reach-avoid value network, and an on-policy fine-tuning phase to reduce distribution shift during policy switching. The approach is validated through extensive simulations and real-world experiments, showing BAS achieves ~50% safety improvement in dynamic environments and up to ~19.8% speed gains with 2.36x fewer collisions than ABS, even with unknown payloads up to 8 kg and slippery terrains. These results demonstrate BAS’s potential to enable robust, collision-free locomotion in varied, real-world conditions and guide future integration with higher-level planning and 3D perception.

Abstract

Real-world legged locomotion systems often need to reconcile agility and safety for different scenarios. Moreover, the underlying dynamics are often unknown and time-variant (e.g., payload, friction). In this paper, we introduce BAS (Bridging Adaptivity and Safety), which builds upon the pipeline of prior work Agile But Safe (ABS)(He et al.) and is designed to provide adaptive safety even in dynamic environments with uncertainties. BAS involves an agile policy to avoid obstacles rapidly and a recovery policy to prevent collisions, a physical parameter estimator that is concurrently trained with agile policy, and a learned control-theoretic RA (reach-avoid) value network that governs the policy switch. Also, the agile policy and RA network are both conditioned on physical parameters to make them adaptive. To mitigate the distribution shift issue, we further introduce an on-policy fine-tuning phase for the estimator to enhance its robustness and accuracy. The simulation results show that BAS achieves 50% better safety than baselines in dynamic environments while maintaining a higher speed on average. In real-world experiments, BAS shows its capability in complex environments with unknown physics (e.g., slippery floors with unknown frictions, unknown payloads up to 8kg), while baselines lack adaptivity, leading to collisions or. degraded agility. As a result, BAS achieves a 19.8% increase in speed and gets a 2.36 times lower collision rate than ABS in the real world. Videos: https://adaptive-safe-locomotion.github.io.

Bridging Adaptivity and Safety: Learning Agile Collision-Free Locomotion Across Varied Physics

TL;DR

Abstract

Paper Structure (26 sections, 1 theorem, 12 equations, 6 figures, 4 tables)

This paper contains 26 sections, 1 theorem, 12 equations, 6 figures, 4 tables.

INTRODUCTION
Preliminaries and Problem Formulation
Dynamics
Goal Settings
Safety Settings
Reach-Avoid Value and Time-Discounted Reach-Avoid Bellman Equation (DRABE)
Lipschitz-continuity of $V^\pi_\gamma$
METHODOLOGIES
Phase 1: Joint-Train Agile Policy and Physical Parameter Estimator
Policy-Conditioned Physical Parameter Estimator
Policy Training
Training Pipeline
Phase 2: Learning Adaptive Reach-Avoid Network
Phase 3: On-Policy Estimator Fine-Tuning
EXPERIMENTS
...and 11 more sections

Key Result

theorem 1

(Lipschitz Continuity of $V^\pi_\gamma$ to $e$) The Learned Value Function $V^\pi_\gamma(s,e)$ Possesses Lipschitz Continuity w.r.t. Environmental Factors $e$ under the following conditions:

Figures (6)

Figure 1: 1) The robot can handle collision-free locomotion in even super slippery terrain condition (soap water on both floor and robot feet), and also can adapt to rough terrain (dry carpet) suddenly. 2) Adaptive recovery triggering of the robot in different circumstances, such as a) early recovery with 8kg payload and b) late recovery with no payload.
Figure 2: BAS Pipeline Overview.
Figure 3: Mass estimation tracking of BAS, BAS w/o fusion and BAS w/o joint-train pipeline. Environment and the history buffer resets per 8s.
Figure 4: Heatmaps of RA values under the different mass of payloads at the state of 3.0m/s base linear velocity right forward. The more reddish, the higher the RA values, the more dangerous; the more bluish, the lower the RA values, the safer.
Figure 5: Real Experiment for Adaptive Safety test Settings, where yellow triangle notes the starting point and red triangle notes the goal. Once the robot reaches the goal, we switch the goal and starting point. A trajectory from the start to the goal and then getting back to the start without collision is counted as success. 0) Vanilla test: same as mass test settings, but without payloads. 1) Mass test: carry a 5kg payload in a corridor and avoid boxes. 2) Friction test: avoid box and a slip sign on very slippery floor and a dry carpet. 3) Slope test: avoid a cone on a grass slope after rain, which is also very slippery.
...and 1 more figures

Theorems & Definitions (2)

theorem 1
proof

Bridging Adaptivity and Safety: Learning Agile Collision-Free Locomotion Across Varied Physics

TL;DR

Abstract

Bridging Adaptivity and Safety: Learning Agile Collision-Free Locomotion Across Varied Physics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (2)