Table of Contents
Fetching ...

Learning Soft Driving Constraints from Vectorized Scene Embeddings while Imitating Expert Trajectories

Niloufar Saeidi Mobarakeh, Behzad Khamidehi, Chunlin Li, Hamidreza Mirkhani, Fazel Arasteh, Mohammed Elmahgiubi, Weize Zhang, Kasra Rezaee, Pascal Poupart

TL;DR

The paper addresses interpretability gaps in imitation-learning-based motion planning by learning driving constraints directly from expert trajectories using vectorized scene embeddings within a maximum-entropy framework, thereby decoupling reward and constraint streams. The trajectory selection probability is defined as $P(\tau|\pi) = \frac{c(\tau) e^{r(\tau)}}{\sum_{i=1}^{N} c(\tau_i) e^{r(\tau_i)}}$, and the constraint model is trained with a loss $\mathcal{L}(\theta) = \frac{1}{|\bar{\mathcal{T}}|+1}\sum_{\tau_i \in \bar{\mathcal{T}}} \gamma(c_\theta(\tau_i), 0) + \gamma(c_\theta(\tau_{best}), 1)$. The approach is simulator-free and validated on the InD and TrafficJams datasets, showing improved interpretability, safer closed-loop performance, and better attention to causal agents compared to score-based baselines. By separating reward and constraint signals and leveraging vectorized scene embeddings, the method provides a transparent mechanism to reason about safety and behavior in complex driving scenarios.

Abstract

The primary goal of motion planning is to generate safe and efficient trajectories for vehicles. Traditionally, motion planning models are trained using imitation learning to mimic the behavior of human experts. However, these models often lack interpretability and fail to provide clear justifications for their decisions. We propose a method that integrates constraint learning into imitation learning by extracting driving constraints from expert trajectories. Our approach utilizes vectorized scene embeddings that capture critical spatial and temporal features, enabling the model to identify and generalize constraints across various driving scenarios. We formulate the constraint learning problem using a maximum entropy model, which scores the motion planner's trajectories based on their similarity to the expert trajectory. By separating the scoring process into distinct reward and constraint streams, we improve both the interpretability of the planner's behavior and its attention to relevant scene components. Unlike existing constraint learning methods that rely on simulators and are typically embedded in reinforcement learning (RL) or inverse reinforcement learning (IRL) frameworks, our method operates without simulators, making it applicable to a wider range of datasets and real-world scenarios. Experimental results on the InD and TrafficJams datasets demonstrate that incorporating driving constraints enhances model interpretability and improves closed-loop performance.

Learning Soft Driving Constraints from Vectorized Scene Embeddings while Imitating Expert Trajectories

TL;DR

The paper addresses interpretability gaps in imitation-learning-based motion planning by learning driving constraints directly from expert trajectories using vectorized scene embeddings within a maximum-entropy framework, thereby decoupling reward and constraint streams. The trajectory selection probability is defined as , and the constraint model is trained with a loss . The approach is simulator-free and validated on the InD and TrafficJams datasets, showing improved interpretability, safer closed-loop performance, and better attention to causal agents compared to score-based baselines. By separating reward and constraint signals and leveraging vectorized scene embeddings, the method provides a transparent mechanism to reason about safety and behavior in complex driving scenarios.

Abstract

The primary goal of motion planning is to generate safe and efficient trajectories for vehicles. Traditionally, motion planning models are trained using imitation learning to mimic the behavior of human experts. However, these models often lack interpretability and fail to provide clear justifications for their decisions. We propose a method that integrates constraint learning into imitation learning by extracting driving constraints from expert trajectories. Our approach utilizes vectorized scene embeddings that capture critical spatial and temporal features, enabling the model to identify and generalize constraints across various driving scenarios. We formulate the constraint learning problem using a maximum entropy model, which scores the motion planner's trajectories based on their similarity to the expert trajectory. By separating the scoring process into distinct reward and constraint streams, we improve both the interpretability of the planner's behavior and its attention to relevant scene components. Unlike existing constraint learning methods that rely on simulators and are typically embedded in reinforcement learning (RL) or inverse reinforcement learning (IRL) frameworks, our method operates without simulators, making it applicable to a wider range of datasets and real-world scenarios. Experimental results on the InD and TrafficJams datasets demonstrate that incorporating driving constraints enhances model interpretability and improves closed-loop performance.

Paper Structure

This paper contains 8 sections, 5 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: The architecture of our planner. (a) a reward only scheme, where the trajectories are scored based on the similarity to the expert trajectory. (b) Proposed combined reward and constraint learning scheme, where the model learns the constraint values along with the rewards.
  • Figure 2: Our constraint labeling scheme: (b) collision and out-of-map trajectories, (c) stuck trajectories. Figure (a) shows a snapshot of the scene at time $t$. In (b), we present different kind of trajectories available to the ego vehicle at time $t$. Figure (c) shows the same scene at time $t+T$. It is worth noting that we label a stuck trajectory only if there exists at least another non-stuck and safe trajectory.
  • Figure 3: Snapshots of the scene demonstrating instances where trajectories with higher scores exhibit unsafe behavior, while constraint values effectively distinguish between trajectories. (a) Forward navigation scenario, (b) Left turn at intersection. In both cases, the ego vehicle is shown in red and the traffic vehicles are blue.
  • Figure 4: Visualization of the model's attention to objects in the scene. For clarity, only the two objects with the highest attention weights are highlighted, with larger circles indicating greater attention. The top two rows correspond to a scenario from the InD dataset, while the bottom two rows represent a scenario from the TrafficJams dataset. In each scenario, the top row (rows 1 and 3) shows the baseline model, and the bottom row (rows 2 and 4) displays the constraint-based model. As illustrated, the baseline model fails to assign sufficient attention to the causal agents. In contrast, the model trained with constraint modules successfully identifies the causal agents and assigns greater attention to them. Additional videos demonstrating these scenarios and the model's performance are available online ( https://youtu.be/PY0luaE3wYI.)