Table of Contents
Fetching ...

RLPP: A Residual Method for Zero-Shot Real-World Autonomous Racing on Scaled Platforms

Edoardo Ghignone, Nicolas Baumann, Cheng Hu, Jonathan Wang, Lei Xie, Andrea Carron, Michele Magno

TL;DR

RLPP introduces a residual RL framework that augments a Pure Pursuit baseline with a learned residual, enabling zero-shot real-world deployment for scaled autonomous racing on the F1TENTH platform. The final action is $\mathbf{u}=\mathbf{u}_{PP}+\mathbf{u}_{RL}$, where $\mathbf{u}_{PP}$ provides baseline steering and speed via lookahead and the RL policy $\mathbf{u}_{RL}=\alpha_{RL}\mathbf{u}_{NN}$ compensates for real-world dynamics; training uses SAC in a single-track simulator with domain randomization and curriculum velocity. On real hardware, RLPP yields up to $6.37\%$ improvements in the minimum lap time and reduces the sim-to-real gap by more than an $8$-fold compared to a baseline RL controller, while closing the gap to state-of-the-art methods by over $52\%$; the method runs onboard with a computation time around $8$ ms, enabling $40$ Hz control. The authors also release the open-source implementation to facilitate broader experimentation and advancement in autonomous racing research.

Abstract

Autonomous racing presents a complex environment requiring robust controllers capable of making rapid decisions under dynamic conditions. While traditional controllers based on tire models are reliable, they often demand extensive tuning or system identification. Reinforcement Learning (RL) methods offer significant potential due to their ability to learn directly from interaction, yet they typically suffer from the sim-to-real gap, where policies trained in simulation fail to perform effectively in the real world. In this paper, we propose RLPP, a residual RL framework that enhances a Pure Pursuit (PP) controller with an RL-based residual. This hybrid approach leverages the reliability and interpretability of PP while using RL to fine-tune the controller's performance in real-world scenarios. Extensive testing on the F1TENTH platform demonstrates that RLPP improves lap times of the baseline controllers by up to 6.37 %, closing the gap to the State-of-the-Art methods by more than 52 % and providing reliable performance in zero-shot real-world deployment, overcoming key challenges associated with the sim-to-real transfer and reducing the performance gap from simulation to reality by more than 8-fold when compared to the baseline RL controller. The RLPP framework is made available as an open-source tool, encouraging further exploration and advancement in autonomous racing research. The code is available at: www.github.com/forzaeth/rlpp.

RLPP: A Residual Method for Zero-Shot Real-World Autonomous Racing on Scaled Platforms

TL;DR

RLPP introduces a residual RL framework that augments a Pure Pursuit baseline with a learned residual, enabling zero-shot real-world deployment for scaled autonomous racing on the F1TENTH platform. The final action is , where provides baseline steering and speed via lookahead and the RL policy compensates for real-world dynamics; training uses SAC in a single-track simulator with domain randomization and curriculum velocity. On real hardware, RLPP yields up to improvements in the minimum lap time and reduces the sim-to-real gap by more than an -fold compared to a baseline RL controller, while closing the gap to state-of-the-art methods by over ; the method runs onboard with a computation time around ms, enabling Hz control. The authors also release the open-source implementation to facilitate broader experimentation and advancement in autonomous racing research.

Abstract

Autonomous racing presents a complex environment requiring robust controllers capable of making rapid decisions under dynamic conditions. While traditional controllers based on tire models are reliable, they often demand extensive tuning or system identification. Reinforcement Learning (RL) methods offer significant potential due to their ability to learn directly from interaction, yet they typically suffer from the sim-to-real gap, where policies trained in simulation fail to perform effectively in the real world. In this paper, we propose RLPP, a residual RL framework that enhances a Pure Pursuit (PP) controller with an RL-based residual. This hybrid approach leverages the reliability and interpretability of PP while using RL to fine-tune the controller's performance in real-world scenarios. Extensive testing on the F1TENTH platform demonstrates that RLPP improves lap times of the baseline controllers by up to 6.37 %, closing the gap to the State-of-the-Art methods by more than 52 % and providing reliable performance in zero-shot real-world deployment, overcoming key challenges associated with the sim-to-real transfer and reducing the performance gap from simulation to reality by more than 8-fold when compared to the baseline RL controller. The RLPP framework is made available as an open-source tool, encouraging further exploration and advancement in autonomous racing research. The code is available at: www.github.com/forzaeth/rlpp.

Paper Structure

This paper contains 15 sections, 10 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: RLPP: the residual structure, summarized in the picture, allows for integrating an rl network with a classical controller, namely pp. Such architecture retains the tuning capabilities of the traditional method with the performance increase of the data-driven rl approach, hence enabling zero-shot deployment on a real-world platform and lap time improvement, without requiring system identification techniques as for sota approaches.
  • Figure 2: Trajectories of the main compared algorithms. Ten consecutive laps are shown for each controller, except for TC-Driver (†) which only attained three consecutive laps without boundary violations. The arrow indicates the starting position for the Frenet $s$ coordinate. The dashed line represents the reference line that is used for each algorithm. Shaded areas within the track indicate 10-meter segments along the reference line, to facilitate references with \ref{['fig:vel_comp_pp']}. The comparison between the comparison method TC-Driver (a) and the proposed method (c) shows that the proposed method follows the reference line more smoothly.
  • Figure 3: Velocity profile comparison across ten consecutive laps, plotted against the Frenet $s$ coordinate. The algorithm presented in this paper is in the darker hue, while (top) and (bottom) are shown with the lighter hue. Shaded areas are present to facilitate references with \ref{['fig:trajectories']}.