Table of Contents
Fetching ...

CE-Nav: Flow-Guided Reinforcement Refinement for Cross-Embodiment Local Navigation

Kai Yang, Tianlin Zhang, Zhengbo Wang, Zedong Chu, Xiaolong Wu, Yang Cai, Mu Xu

TL;DR

CE-Nav tackles the problem of generalizing local navigation across diverse robot morphologies by decoupling universal geometric reasoning from embodiment-specific dynamics in a two-stage IL-then-RL framework. It introduces VelFlow, a conditional normalizing flow that models the multi-modal distribution of kinematically-sound actions from an offline expert, and a lightweight Dynamics-Aware Refiner that online-learns to adapt those proposals to a target robot’s dynamics using curriculum-guided reinforcement learning. The approach achieves state-of-the-art performance with dramatically reduced adaptation data, demonstrated across quadrupeds, a biped, and a quadrotor, and transfers effectively to real-world deployments with sim-to-real transfer. This framework offers a scalable path to generalizable, fast-adapting navigation systems capable of integrating with higher-level planners and perception modules.

Abstract

Generalizing local navigation policies across diverse robot morphologies is a critical challenge. Progress is often hindered by the need for costly and embodiment-specific data, the tight coupling of planning and control, and the "disastrous averaging" problem where deterministic models fail to capture multi-modal decisions (e.g., turning left or right). We introduce CE-Nav, a novel two-stage (IL-then-RL) framework that systematically decouples universal geometric reasoning from embodiment-specific dynamic adaptation. First, we train an embodiment-agnostic General Expert offline using imitation learning. This expert, a conditional normalizing flow model named VelFlow, learns the full distribution of kinematically-sound actions from a large-scale dataset generated by a classical planner, completely avoiding real robot data and resolving the multi-modality issue. Second, for a new robot, we freeze the expert and use it as a guiding prior to train a lightweight, Dynamics-Aware Refiner via online reinforcement learning. This refiner rapidly learns to compensate for the target robot's specific dynamics and controller imperfections with minimal environmental interaction. Extensive experiments on quadrupeds, bipeds, and quadrotors show that CE-Nav achieves state-of-the-art performance while drastically reducing adaptation cost. Successful real-world deployments further validate our approach as an efficient and scalable solution for building generalizable navigation systems. Code is available at https://github.com/amap-cvlab/CE-Nav.

CE-Nav: Flow-Guided Reinforcement Refinement for Cross-Embodiment Local Navigation

TL;DR

CE-Nav tackles the problem of generalizing local navigation across diverse robot morphologies by decoupling universal geometric reasoning from embodiment-specific dynamics in a two-stage IL-then-RL framework. It introduces VelFlow, a conditional normalizing flow that models the multi-modal distribution of kinematically-sound actions from an offline expert, and a lightweight Dynamics-Aware Refiner that online-learns to adapt those proposals to a target robot’s dynamics using curriculum-guided reinforcement learning. The approach achieves state-of-the-art performance with dramatically reduced adaptation data, demonstrated across quadrupeds, a biped, and a quadrotor, and transfers effectively to real-world deployments with sim-to-real transfer. This framework offers a scalable path to generalizable, fast-adapting navigation systems capable of integrating with higher-level planners and perception modules.

Abstract

Generalizing local navigation policies across diverse robot morphologies is a critical challenge. Progress is often hindered by the need for costly and embodiment-specific data, the tight coupling of planning and control, and the "disastrous averaging" problem where deterministic models fail to capture multi-modal decisions (e.g., turning left or right). We introduce CE-Nav, a novel two-stage (IL-then-RL) framework that systematically decouples universal geometric reasoning from embodiment-specific dynamic adaptation. First, we train an embodiment-agnostic General Expert offline using imitation learning. This expert, a conditional normalizing flow model named VelFlow, learns the full distribution of kinematically-sound actions from a large-scale dataset generated by a classical planner, completely avoiding real robot data and resolving the multi-modality issue. Second, for a new robot, we freeze the expert and use it as a guiding prior to train a lightweight, Dynamics-Aware Refiner via online reinforcement learning. This refiner rapidly learns to compensate for the target robot's specific dynamics and controller imperfections with minimal environmental interaction. Extensive experiments on quadrupeds, bipeds, and quadrotors show that CE-Nav achieves state-of-the-art performance while drastically reducing adaptation cost. Successful real-world deployments further validate our approach as an efficient and scalable solution for building generalizable navigation systems. Code is available at https://github.com/amap-cvlab/CE-Nav.

Paper Structure

This paper contains 33 sections, 8 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Overview of the CE-Nav two-stage framework. Stage 1 (Left): A multi-modal, embodiment-agnostic General Expert is trained offline via imitation learning on expert data. Stage 2 (Right): The frozen expert is used as a guiding prior to train a Dynamics-Aware Refiner via online reinforcement learning, allowing it to adapt to a specific robot's dynamics.
  • Figure 2: Examples of geometry simulation environments used for expert data generation. (a) Corridor environment. (b) Obstacle forest environment.
  • Figure 3: (a) The "obstacle forest" with $N_o=500$, $l=40m$. (b) Visualization of the 2D raycast input, where blue lines indicate rays (up to a 4m range) that have detected an obstacle.
  • Figure 4: Multi-modal Decision-Making in CE-Nav. (a) 100 robots navigate past an obstacle by splitting into two groups. At the decision point: (b) The expert's reference velocity ($v_{\text{ref}}$) proposals form two distinct clusters, representing the choice to turn left or right. (c) The refiner's final velocity commands ($v_{\text{final}}$) maintain this bimodal structure while adjusting for dynamics.
  • Figure 4: Real-world navigation performance comparisons.
  • ...and 1 more figures