Table of Contents
Fetching ...

SEA-Nav: Efficient Policy Learning for Safe and Agile Quadruped Navigation in Cluttered Environments

Shiyi Chen, Mingye Yang, Haiyan Mao, Jiaqi Zhang, Haiyi Liu, Shuheng He, Debing Zhang, Zihao Qiu, Chun Zhang

Abstract

Efficiently training quadruped robot navigation in densely cluttered environments remains a significant challenge. Existing methods are either limited by a lack of safety and agility in simple obstacle distributions or suffer from slow locomotion in complex environments, often requiring excessively long training phases. To this end, we propose SEA-Nav (Safe, Efficient, and Agile Navigation), a reinforcement learning framework for quadruped navigation. Within diverse and dense obstacle environments, a differentiable control barrier function (CBF)-based shield constraints the navigation policy to output safe velocity commands. An adaptive collision replay mechanism and hazardous exploration rewards are introduced to increase the probability of learning from critical experiences, guiding efficient exploration and exploitation. Finally, kinematic action constraints are incorporated to ensure safe velocity commands, facilitating successful physical deployment. To the best of our knowledge, this is the first approach that achieves highly challenging quadruped navigation in the real world with minute-level training time.

SEA-Nav: Efficient Policy Learning for Safe and Agile Quadruped Navigation in Cluttered Environments

Abstract

Efficiently training quadruped robot navigation in densely cluttered environments remains a significant challenge. Existing methods are either limited by a lack of safety and agility in simple obstacle distributions or suffer from slow locomotion in complex environments, often requiring excessively long training phases. To this end, we propose SEA-Nav (Safe, Efficient, and Agile Navigation), a reinforcement learning framework for quadruped navigation. Within diverse and dense obstacle environments, a differentiable control barrier function (CBF)-based shield constraints the navigation policy to output safe velocity commands. An adaptive collision replay mechanism and hazardous exploration rewards are introduced to increase the probability of learning from critical experiences, guiding efficient exploration and exploitation. Finally, kinematic action constraints are incorporated to ensure safe velocity commands, facilitating successful physical deployment. To the best of our knowledge, this is the first approach that achieves highly challenging quadruped navigation in the real world with minute-level training time.
Paper Structure (20 sections, 8 equations, 6 figures, 5 tables)

This paper contains 20 sections, 8 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: SEA-Nav is trained in minute-level time and deployed zero-shot in a previously unseen maze. The robot successfully escapes using the out-of-distribution built-in MPC controller and onboard sparse LiDAR.
  • Figure 2: Overview of the proposed SEA-Nav pipeline. LiDAR rays provide exteroceptive observations that are encoded and fused by the Encoder and Backbone into shared features. The Actor maps shared features to a navigation action head and a safety-gain $\alpha$ head, producing a nominal velocity command and an adaptive gain; the LSE-CBF Shield then solves for a safe velocity command. The Critic directly predicts state value from shared features. The Actor and Critic are optimized jointly with PPO, shield intervention, and kinematic regularization losses.
  • Figure 3: SEA-Nav trajectories in three navigation scenarios of increasing difficulty. Each subfigure shows two distinct start-goal trials. SEA-Nav successfully traverses narrow passages and performs timely maneuvering adjustments to avoid entrapment in cluttered environments.
  • Figure 4: (Left) Trajectory plot; (right) velocity profiles. SEA-Nav maintains larger obstacle clearance and smoother speed variations. The safety gain $\alpha$ decreases in hazardous regions so the CBF Shield dominates, and increases in safe regions where the nominal navigation command takes the lead.
  • Figure 5: Real-world experimental environments. Ten trials are conducted in each environment.
  • ...and 1 more figures