Table of Contents
Fetching ...

Learning Dynamic Weight Adjustment for Spatial-Temporal Trajectory Planning in Crowd Navigation

Muqing Cao, Xinhang Xu, Yizhuo Yang, Jianping Li, Tongxing Jin, Pengfei Wang, Tzu-Yi Hung, Guosheng Lin, Lihua Xie

TL;DR

The paper tackles crowd navigation by learning to adapt objective weights in a spatial-temporal trajectory planner. It introduces a neural policy trained with PPO in a POMDP, mapping sensor observations to a weight vector $\bm{w}=(w_T,w_f,w_{\dot{\theta}},w_s,w_h)$ for the planner, with rich observation encoding that includes static and dynamic maps. The approach integrates a 5th-order polynomial trajectory representation, soft feasibility penalties, and multi-objective costs, optimized over time while balancing safety, efficiency, and goal attainment. Results from simulation and a real-world 300 m corridor demonstrate improved safety and adaptability compared with fixed-weight planning and other learning-based baselines, highlighting practical impact for autonomous delivery and service robots in human crowds.

Abstract

Robot navigation in dense human crowds poses a significant challenge due to the complexity of human behavior in dynamic and obstacle-rich environments. In this work, we propose a dynamic weight adjustment scheme using a neural network to predict the optimal weights of objectives in an optimization-based motion planner. We adopt a spatial-temporal trajectory planner and incorporate diverse objectives to achieve a balance among safety, efficiency, and goal achievement in complex and dynamic environments. We design the network structure, observation encoding, and reward function to effectively train the policy network using reinforcement learning, allowing the robot to adapt its behavior in real time based on environmental and pedestrian information. Simulation results show improved safety compared to the fixed-weight planner and the state-of-the-art learning-based methods, and verify the ability of the learned policy to adaptively adjust the weights based on the observed situations. The approach's feasibility is demonstrated in a navigation task using an autonomous delivery robot across a crowded corridor over a 300 m distance.

Learning Dynamic Weight Adjustment for Spatial-Temporal Trajectory Planning in Crowd Navigation

TL;DR

The paper tackles crowd navigation by learning to adapt objective weights in a spatial-temporal trajectory planner. It introduces a neural policy trained with PPO in a POMDP, mapping sensor observations to a weight vector for the planner, with rich observation encoding that includes static and dynamic maps. The approach integrates a 5th-order polynomial trajectory representation, soft feasibility penalties, and multi-objective costs, optimized over time while balancing safety, efficiency, and goal attainment. Results from simulation and a real-world 300 m corridor demonstrate improved safety and adaptability compared with fixed-weight planning and other learning-based baselines, highlighting practical impact for autonomous delivery and service robots in human crowds.

Abstract

Robot navigation in dense human crowds poses a significant challenge due to the complexity of human behavior in dynamic and obstacle-rich environments. In this work, we propose a dynamic weight adjustment scheme using a neural network to predict the optimal weights of objectives in an optimization-based motion planner. We adopt a spatial-temporal trajectory planner and incorporate diverse objectives to achieve a balance among safety, efficiency, and goal achievement in complex and dynamic environments. We design the network structure, observation encoding, and reward function to effectively train the policy network using reinforcement learning, allowing the robot to adapt its behavior in real time based on environmental and pedestrian information. Simulation results show improved safety compared to the fixed-weight planner and the state-of-the-art learning-based methods, and verify the ability of the learned policy to adaptively adjust the weights based on the observed situations. The approach's feasibility is demonstrated in a navigation task using an autonomous delivery robot across a crowded corridor over a 300 m distance.

Paper Structure

This paper contains 14 sections, 12 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: The real-world experiment of the proposed method.
  • Figure 2: Diagram illustrating proposed navigation system that integrates sensor data processing with policy learning to adjust weight of planner.
  • Figure 3: The structure of the environment encoder.
  • Figure 4: Training environment.
  • Figure 5: Test scenes from left to right: (1) obstacle- and human-populated scene, (2) obstacle-free and human-dense scene, and (3) narrow indoor scene.
  • ...and 1 more figures