Table of Contents
Fetching ...

Developing Path Planning with Behavioral Cloning and Proximal Policy Optimization for Path-Tracking and Static Obstacle Nudging

Mingyan Zhou, Biao Wang, Tian Tan, Xiatao Sun

TL;DR

A path planning method that uses Behavioral Cloning for path-tracking and Proximal Policy Optimization for static obstacle nudging and outputs lateral offset values to adjust the given reference waypoints and performs modified path for different controllers is introduced.

Abstract

In autonomous driving, end-to-end methods utilizing Imitation Learning (IL) and Reinforcement Learning (RL) are becoming more and more common. However, they do not involve explicit reasoning like classic robotics workflow and planning with horizons, resulting in strategies implicit and myopic. In this paper, we introduce a path planning method that uses Behavioral Cloning (BC) for path-tracking and Proximal Policy Optimization (PPO) for static obstacle nudging. It outputs lateral offset values to adjust the given reference waypoints and performs modified path for different controllers. Experimental results show that the algorithm can do path following that mimics the expert performance of path-tracking controllers, and avoid collision to fixed obstacles. The method makes a good attempt at planning with learning-based methods in path planning problems of autonomous driving.

Developing Path Planning with Behavioral Cloning and Proximal Policy Optimization for Path-Tracking and Static Obstacle Nudging

TL;DR

A path planning method that uses Behavioral Cloning for path-tracking and Proximal Policy Optimization for static obstacle nudging and outputs lateral offset values to adjust the given reference waypoints and performs modified path for different controllers is introduced.

Abstract

In autonomous driving, end-to-end methods utilizing Imitation Learning (IL) and Reinforcement Learning (RL) are becoming more and more common. However, they do not involve explicit reasoning like classic robotics workflow and planning with horizons, resulting in strategies implicit and myopic. In this paper, we introduce a path planning method that uses Behavioral Cloning (BC) for path-tracking and Proximal Policy Optimization (PPO) for static obstacle nudging. It outputs lateral offset values to adjust the given reference waypoints and performs modified path for different controllers. Experimental results show that the algorithm can do path following that mimics the expert performance of path-tracking controllers, and avoid collision to fixed obstacles. The method makes a good attempt at planning with learning-based methods in path planning problems of autonomous driving.
Paper Structure (15 sections, 11 equations, 8 figures)

This paper contains 15 sections, 11 equations, 8 figures.

Figures (8)

  • Figure 1: Tracking path through path-planning-based BC. Given demonstration (blue trace) from the expert (single-track model in grey), instead of deviation or collision (yellow marks), the vehicle (single-track model in black) learns to mimic the expert by adjusting lateral offsets (red line segments) on the selected path (green dots) obtained by current state (cyan dot), reference waypoints (black dots), and closest waypoint (crimson dot).
  • Figure 2: Static obstacle nudging through path-planning-based PPO. After bootstrapping by BC as illustrated in Fig. \ref{['fig:illu_tracking']}, the vehicle performs planning similar to the expert's (blue trace). To avoid obstacles (grey circle) that may block the path, the vehicle adopts PPO to adjust the policy that outputs offsets to get a new path (purple trace), which reflects as adding new deviations (yellow line segments) to get new waypoints (pink dots).
  • Figure 3: Structure of path-tracking with path-planning-based BC. Controller directly takes the path from waypoints as reference (blue lines) to train the policy. During validation process, offsets are added up the to get the modified path for controller (red lines).
  • Figure 4: Structure of static obstacle nudging with path-planning-based PPO. Bootstrapped policy by BC is trained and tested using PPO to output lateral offsets for modifying paths, thereby avoiding obstacles.
  • Figure 5: The BC performance for path-tracking with Pure Pursuit and MPC using different total timesteps. Modified paths and horizon paths are marked in red and yellow. Lookahead points are shown in cyan in Pure Pursuit plots.
  • ...and 3 more figures