Table of Contents
Fetching ...

Reinforcement Learning-based Dynamic Adaptation for Sampling-Based Motion Planning in Agile Autonomous Driving

Alexander Langmann, Yevhenii Tokarev, Mattia Piccinini, Korbinian Moller, Johannes Betz

TL;DR

This work presents a hybrid motion‑planning approach that layers a high‑level reinforcement learning agent atop a low‑level sampling‑based planner in a Frenet frame to achieve dynamic, interactive maneuvers in autonomous racing. The RL agent selects among predefined cost‑weight sets to balance safety, racing performance, and interaction with opponents, preserving trajectory validity by design. Evaluations in simulation demonstrate that the adaptive planner attains 0% collisions while delivering faster overtakes and better interaction than static parameter configurations, and it generalizes to unseen tracks. The approach offers a path toward adaptive yet interpretable motion planning suitable for safety‑critical autonomous driving beyond racing scenarios.

Abstract

Sampling-based trajectory planners are widely used for agile autonomous driving due to their ability to generate fast, smooth, and kinodynamically feasible trajectories. However, their behavior is often governed by a cost function with manually tuned, static weights, which forces a tactical compromise that is suboptimal across the wide range of scenarios encountered in a race. To address this shortcoming, we propose using a Reinforcement Learning (RL) agent as a high-level behavioral selector that dynamically switches the cost function parameters of an analytical, low-level trajectory planner during runtime. We show the effectiveness of our approach in simulation in an autonomous racing environment where our RL-based planner achieved 0% collision rate while reducing overtaking time by up to 60% compared to state-of-the-art static planners. Our new agent now dynamically switches between aggressive and conservative behaviors, enabling interactive maneuvers unattainable with static configurations. These results demonstrate that integrating reinforcement learning as a high-level selector resolves the inherent trade-off between safety and competitiveness in autonomous racing planners. The proposed methodology offers a pathway toward adaptive yet interpretable motion planning for broader autonomous driving applications.

Reinforcement Learning-based Dynamic Adaptation for Sampling-Based Motion Planning in Agile Autonomous Driving

TL;DR

This work presents a hybrid motion‑planning approach that layers a high‑level reinforcement learning agent atop a low‑level sampling‑based planner in a Frenet frame to achieve dynamic, interactive maneuvers in autonomous racing. The RL agent selects among predefined cost‑weight sets to balance safety, racing performance, and interaction with opponents, preserving trajectory validity by design. Evaluations in simulation demonstrate that the adaptive planner attains 0% collisions while delivering faster overtakes and better interaction than static parameter configurations, and it generalizes to unseen tracks. The approach offers a path toward adaptive yet interpretable motion planning suitable for safety‑critical autonomous driving beyond racing scenarios.

Abstract

Sampling-based trajectory planners are widely used for agile autonomous driving due to their ability to generate fast, smooth, and kinodynamically feasible trajectories. However, their behavior is often governed by a cost function with manually tuned, static weights, which forces a tactical compromise that is suboptimal across the wide range of scenarios encountered in a race. To address this shortcoming, we propose using a Reinforcement Learning (RL) agent as a high-level behavioral selector that dynamically switches the cost function parameters of an analytical, low-level trajectory planner during runtime. We show the effectiveness of our approach in simulation in an autonomous racing environment where our RL-based planner achieved 0% collision rate while reducing overtaking time by up to 60% compared to state-of-the-art static planners. Our new agent now dynamically switches between aggressive and conservative behaviors, enabling interactive maneuvers unattainable with static configurations. These results demonstrate that integrating reinforcement learning as a high-level selector resolves the inherent trade-off between safety and competitiveness in autonomous racing planners. The proposed methodology offers a pathway toward adaptive yet interpretable motion planning for broader autonomous driving applications.

Paper Structure

This paper contains 15 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Comparison of our RL-based planner with dynamic weight adaptation against a static-weight planner.
  • Figure 2: Framework overview: Our RL agent observes the states of the ego vehicle and its closest opponent, along with track-dependent features, which are processed by a PPO-based actor–critic model. The RL actions $a_{t}$ adapt the cost function parameters of a sampling-based planner to boost the performance in dynamic overtaking maneuvers.
  • Figure 3: Representation of the interaction zone (dark grey area) that triggers the active terms in the RL reward function. When the ego vehicle enters this zone, the reward logic activates gap and collision penalties, and fixed raceline following is suppressed. Trajectories are color-coded by costs from green (cheap) to red (expensive).
  • Figure 4: Qualitative analysis of an overtaking maneuver, comparing our RL-based planner against a static-weight planner with the NR parameter set. In both cases, the opponent is a reactive NR-planner. Left: driven trajectories in a section of the analyzed scenario. Right: lateral positions (top), longitudinal gap $\Delta s$ between ego and opponent (upper middle, where $\Delta s>0$ means the opponent is ahead of the ego), velocity profiles (lower middle), and selected parameter set from our planner (bottom) during the scenario.
  • Figure 5: Overtake scenario similar to Figure \ref{['fig:qualitative_analysis']}, in our 3D simulation environment. Dark green trajectories denote the selected trajectory in each step from the set of available trajectories (light green), and the red trajectory visualizes the opponent's behavior. Note that only a subset of the generated trajectories is shown.