SEA-Nav: Efficient Policy Learning for Safe and Agile Quadruped Navigation in Cluttered Environments

Shiyi Chen; Mingye Yang; Haiyan Mao; Jiaqi Zhang; Haiyi Liu; Shuheng He; Debing Zhang; Zihao Qiu; Chun Zhang

SEA-Nav: Efficient Policy Learning for Safe and Agile Quadruped Navigation in Cluttered Environments

Shiyi Chen, Mingye Yang, Haiyan Mao, Jiaqi Zhang, Haiyi Liu, Shuheng He, Debing Zhang, Zihao Qiu, Chun Zhang

Abstract

Efficiently training quadruped robot navigation in densely cluttered environments remains a significant challenge. Existing methods are either limited by a lack of safety and agility in simple obstacle distributions or suffer from slow locomotion in complex environments, often requiring excessively long training phases. To this end, we propose SEA-Nav (Safe, Efficient, and Agile Navigation), a reinforcement learning framework for quadruped navigation. Within diverse and dense obstacle environments, a differentiable control barrier function (CBF)-based shield constraints the navigation policy to output safe velocity commands. An adaptive collision replay mechanism and hazardous exploration rewards are introduced to increase the probability of learning from critical experiences, guiding efficient exploration and exploitation. Finally, kinematic action constraints are incorporated to ensure safe velocity commands, facilitating successful physical deployment. To the best of our knowledge, this is the first approach that achieves highly challenging quadruped navigation in the real world with minute-level training time.

SEA-Nav: Efficient Policy Learning for Safe and Agile Quadruped Navigation in Cluttered Environments

Abstract

Paper Structure (20 sections, 8 equations, 6 figures, 5 tables)

This paper contains 20 sections, 8 equations, 6 figures, 5 tables.

INTRODUCTION
RELATED WORK
Learning-Based Robot Navigation
Safe RL and Barrier Functions
METHOD
Key Challenges & Overview
System Pipeline & MDP Formulation
Adaptive Collision-State Initialization (ACSI)
Differentiable Adaptive LSE-CBF Layer
LSE Aggregation for Smooth Safety Constraints
Damped Analytical Safety Projection
End-to-End Differentiability as Inductive Bias
Loss Function Design
Training in Simulation
EXPERIMENTS
...and 5 more sections

Figures (6)

Figure 1: SEA-Nav is trained in minute-level time and deployed zero-shot in a previously unseen maze. The robot successfully escapes using the out-of-distribution built-in MPC controller and onboard sparse LiDAR.
Figure 2: Overview of the proposed SEA-Nav pipeline. LiDAR rays provide exteroceptive observations that are encoded and fused by the Encoder and Backbone into shared features. The Actor maps shared features to a navigation action head and a safety-gain $\alpha$ head, producing a nominal velocity command and an adaptive gain; the LSE-CBF Shield then solves for a safe velocity command. The Critic directly predicts state value from shared features. The Actor and Critic are optimized jointly with PPO, shield intervention, and kinematic regularization losses.
Figure 3: SEA-Nav trajectories in three navigation scenarios of increasing difficulty. Each subfigure shows two distinct start-goal trials. SEA-Nav successfully traverses narrow passages and performs timely maneuvering adjustments to avoid entrapment in cluttered environments.
Figure 4: (Left) Trajectory plot; (right) velocity profiles. SEA-Nav maintains larger obstacle clearance and smoother speed variations. The safety gain $\alpha$ decreases in hazardous regions so the CBF Shield dominates, and increases in safe regions where the nominal navigation command takes the lead.
Figure 5: Real-world experimental environments. Ten trials are conducted in each environment.
...and 1 more figures

SEA-Nav: Efficient Policy Learning for Safe and Agile Quadruped Navigation in Cluttered Environments

Abstract

SEA-Nav: Efficient Policy Learning for Safe and Agile Quadruped Navigation in Cluttered Environments

Authors

Abstract

Table of Contents

Figures (6)