Table of Contents
Fetching ...

Learning to Sample: Reinforcement Learning-Guided Sampling for Autonomous Vehicle Motion Planning

Korbinian Moller, Roland Stroop, Mattia Piccinini, Alexander Langmann, Johannes Betz

Abstract

Sampling-based motion planning is a well-established approach in autonomous driving, valued for its modularity and analytical tractability. In complex urban scenarios, however, uniform or heuristic sampling often produces many infeasible or irrelevant trajectories. We address this limitation with a hybrid framework that learns where to sample while keeping trajectory generation and evaluation fully analytical and verifiable. A reinforcement learning (RL) agent guides the sampling process toward regions of the action space likely to yield feasible trajectories, while evaluation and final selection remains governed by deterministic feasibility checks and cost functions. We couple the RL sampler with a world model (WM) based on a decodable deep set encoder, enabling both variable numbers of traffic participants and reconstructable latent representations. The approach is evaluated in the CommonRoad (CR) simulation environment and compared against uniform-sampling baselines, showing up to 99% fewer required samples and a runtime reduction of up to 84% while maintaining planning quality in terms of success and collision-free rates. These improvements lead to faster, more reliable decision-making for autonomous vehicles in urban environments.

Learning to Sample: Reinforcement Learning-Guided Sampling for Autonomous Vehicle Motion Planning

Abstract

Sampling-based motion planning is a well-established approach in autonomous driving, valued for its modularity and analytical tractability. In complex urban scenarios, however, uniform or heuristic sampling often produces many infeasible or irrelevant trajectories. We address this limitation with a hybrid framework that learns where to sample while keeping trajectory generation and evaluation fully analytical and verifiable. A reinforcement learning (RL) agent guides the sampling process toward regions of the action space likely to yield feasible trajectories, while evaluation and final selection remains governed by deterministic feasibility checks and cost functions. We couple the RL sampler with a world model (WM) based on a decodable deep set encoder, enabling both variable numbers of traffic participants and reconstructable latent representations. The approach is evaluated in the CommonRoad (CR) simulation environment and compared against uniform-sampling baselines, showing up to 99% fewer required samples and a runtime reduction of up to 84% while maintaining planning quality in terms of success and collision-free rates. These improvements lead to faster, more reliable decision-making for autonomous vehicles in urban environments.

Paper Structure

This paper contains 13 sections, 12 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Comparison of uniform sampling and the proposed guided sampling in a scenario with a stationary obstacle.
  • Figure 2: High-level framework. The WM encodes the environment into a latent state $z_t$, where an RL agent selects high-level end-conditions $g(\tau)$. A deterministic planner then generates and evaluates trajectories that satisfy kinematic and safety constraints before producing the optimal trajectory. Blue paths in the diagram indicate components that are only used during training.
  • Figure 3: A driving scenario is mapped into the structured observation space $\mathcal{O}_t$, represented in curvilinear coordinates $(s,d)$ around a given $\Gamma$.
  • Figure 4: Neighborhood sampling around the goal state $g(\tau)$ returned by the RL agent. The planner perturbs $g(\tau)$ within the local set $\mathcal{G}(g(\tau))$ and generates the corresponding candidate trajectories $\mathcal{T}$.
  • Figure 5: Aggregate performance across all evaluation scenarios. Bars indicate success and failure outcomes, while crosses (✕) denote the number of sampled trajectories. Although B125 and RL2 achieve similar success rates, our RL-guided variants concentrate sampling in promising regions, resulting in a higher fraction of feasible and drivable candidates per sample (\ref{['fig:traj_distribution']}).
  • ...and 4 more figures