Table of Contents
Fetching ...

Learning to Tune Pure Pursuit in Autonomous Racing: Joint Lookahead and Steering-Gain Control with PPO

Mohamed Elgouhary, Amr S. El-Wakeel

TL;DR

Across simulation and real-car tests, the proposed RL-PP controller that jointly selects (Ld, g) consistently outperforms fixed-lookahead PP, velocity-scheduled adaptive PP, and an RL lookahead-only variant, demonstrating that policy-guided parameter tuning can reliably improve classical geometry-based control.

Abstract

Pure Pursuit (PP) is widely used in autonomous racing for real-time path tracking due to its efficiency and geometric clarity, yet performance is highly sensitive to how key parameters-lookahead distance and steering gain-are chosen. Standard velocity-based schedules adjust these only approximately and often fail to transfer across tracks and speed profiles. We propose a reinforcement-learning (RL) approach that jointly chooses the lookahead Ld and a steering gain g online using Proximal Policy Optimization (PPO). The policy observes compact state features (speed and curvature taps) and outputs (Ld, g) at each control step. Trained in F1TENTH Gym and deployed in a ROS 2 stack, the policy drives PP directly (with light smoothing) and requires no per-map retuning. Across simulation and real-car tests, the proposed RL-PP controller that jointly selects (Ld, g) consistently outperforms fixed-lookahead PP, velocity-scheduled adaptive PP, and an RL lookahead-only variant, and it also exceeds a kinematic MPC raceline tracker under our evaluated settings in lap time, path-tracking accuracy, and steering smoothness, demonstrating that policy-guided parameter tuning can reliably improve classical geometry-based control.

Learning to Tune Pure Pursuit in Autonomous Racing: Joint Lookahead and Steering-Gain Control with PPO

TL;DR

Across simulation and real-car tests, the proposed RL-PP controller that jointly selects (Ld, g) consistently outperforms fixed-lookahead PP, velocity-scheduled adaptive PP, and an RL lookahead-only variant, demonstrating that policy-guided parameter tuning can reliably improve classical geometry-based control.

Abstract

Pure Pursuit (PP) is widely used in autonomous racing for real-time path tracking due to its efficiency and geometric clarity, yet performance is highly sensitive to how key parameters-lookahead distance and steering gain-are chosen. Standard velocity-based schedules adjust these only approximately and often fail to transfer across tracks and speed profiles. We propose a reinforcement-learning (RL) approach that jointly chooses the lookahead Ld and a steering gain g online using Proximal Policy Optimization (PPO). The policy observes compact state features (speed and curvature taps) and outputs (Ld, g) at each control step. Trained in F1TENTH Gym and deployed in a ROS 2 stack, the policy drives PP directly (with light smoothing) and requires no per-map retuning. Across simulation and real-car tests, the proposed RL-PP controller that jointly selects (Ld, g) consistently outperforms fixed-lookahead PP, velocity-scheduled adaptive PP, and an RL lookahead-only variant, and it also exceeds a kinematic MPC raceline tracker under our evaluated settings in lap time, path-tracking accuracy, and steering smoothness, demonstrating that policy-guided parameter tuning can reliably improve classical geometry-based control.
Paper Structure (15 sections, 10 equations, 8 figures, 4 tables)

This paper contains 15 sections, 10 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Pipeline: mapping and minimum-curvature raceline feed the simulator and a baseline PP controller. A PPO agent learns $(L_d,g)$ and integrates with PP via ROS topics. We validate in simulation and deploy on a real F1TENTH car.
  • Figure 2: PPO diagnostics (1/2): update stability metrics. Approx. KL and clip fraction remain small after the initial transient, indicating conservative policy updates.
  • Figure 3: PPO diagnostics (2/2): exploration and critic learning. The policy action standard deviation decreases as exploration reduces, while the critic loss stabilizes as value estimates improve.
  • Figure 4: Classical Pure Pursuit failure modes with fixed lookahead.
  • Figure 5: Simulator maps used in our study: (a) Hockenheim for training, (b) Montreal, and (c) YasMarina for evaluation.
  • ...and 3 more figures