Table of Contents
Fetching ...

F1tenth Autonomous Racing With Offline Reinforcement Learning Methods

Prajwal Koirala, Cody Fleming

TL;DR

This paper tackles offline reinforcement learning for autonomous racing on the F1tenth platform, aiming to learn from expert demonstrations and to achieve cross-track generalization. It compares four policy families—Return-Conditioned Decision Tree Policy ($RCDTP$), Decision Transformer (DT), Diffusion Policy (DP), and Q-learning variants—trained on offline data collected via a waypoint-based controller and evaluates zero-shot transfer to unseen racetracks. Results show $RCDTP$ offers strong sample efficiency on single-track data, while larger architectures (DT, DP) generalize better with multi-track data, and online RL methods struggle to complete laps; SAC remains the most stable online baseline. The findings guide method selection for offline RL in driving and point to future work on multi-agent scenarios, sim-to-real transfer, and safe reinforcement learning.

Abstract

Autonomous racing serves as a critical platform for evaluating automated driving systems and enhancing vehicle mobility intelligence. This work investigates offline reinforcement learning methods to train agents within the dynamic F1tenth racing environment. The study begins by exploring the challenges of online training in the Austria race track environment, where agents consistently fail to complete the laps. Consequently, this research pivots towards an offline strategy, leveraging `expert' demonstration dataset to facilitate agent training. A waypoint-based suboptimal controller is developed to gather data with successful lap episodes. This data is then employed to train offline learning-based algorithms, with a subsequent analysis of the agents' cross-track performance, evaluating their zero-shot transferability from seen to unseen scenarios and their capacity to adapt to changes in environment dynamics. Beyond mere algorithm benchmarking in autonomous racing scenarios, this study also introduces and describes the machinery of our return-conditioned decision tree-based policy, comparing its performance with methods that employ fully connected neural networks, Transformers, and Diffusion Policies and highlighting some insights into method selection for training autonomous agents in driving interactions.

F1tenth Autonomous Racing With Offline Reinforcement Learning Methods

TL;DR

This paper tackles offline reinforcement learning for autonomous racing on the F1tenth platform, aiming to learn from expert demonstrations and to achieve cross-track generalization. It compares four policy families—Return-Conditioned Decision Tree Policy (), Decision Transformer (DT), Diffusion Policy (DP), and Q-learning variants—trained on offline data collected via a waypoint-based controller and evaluates zero-shot transfer to unseen racetracks. Results show offers strong sample efficiency on single-track data, while larger architectures (DT, DP) generalize better with multi-track data, and online RL methods struggle to complete laps; SAC remains the most stable online baseline. The findings guide method selection for offline RL in driving and point to future work on multi-agent scenarios, sim-to-real transfer, and safe reinforcement learning.

Abstract

Autonomous racing serves as a critical platform for evaluating automated driving systems and enhancing vehicle mobility intelligence. This work investigates offline reinforcement learning methods to train agents within the dynamic F1tenth racing environment. The study begins by exploring the challenges of online training in the Austria race track environment, where agents consistently fail to complete the laps. Consequently, this research pivots towards an offline strategy, leveraging `expert' demonstration dataset to facilitate agent training. A waypoint-based suboptimal controller is developed to gather data with successful lap episodes. This data is then employed to train offline learning-based algorithms, with a subsequent analysis of the agents' cross-track performance, evaluating their zero-shot transferability from seen to unseen scenarios and their capacity to adapt to changes in environment dynamics. Beyond mere algorithm benchmarking in autonomous racing scenarios, this study also introduces and describes the machinery of our return-conditioned decision tree-based policy, comparing its performance with methods that employ fully connected neural networks, Transformers, and Diffusion Policies and highlighting some insights into method selection for training autonomous agents in driving interactions.
Paper Structure (20 sections, 9 equations, 8 figures, 1 table)

This paper contains 20 sections, 9 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Overview of the Offline RL Methods used in this work
  • Figure 2: Waypoint based control of the f1tenth racecar
  • Figure 3: Visualization of an episode of f1tenth simulation in Austria racetrack
  • Figure 4: Sample reward maps that are based on progress (Derived from the work of brunnbauer2021modelAxel2021racecar)
  • Figure 5: Agent-Environment interaction in a return conditioned setting
  • ...and 3 more figures