Table of Contents
Fetching ...

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Nachiket Deo, Mohan M. Trivedi

TL;DR

This work tackles trajectory forecasting in unknown environments by conditioning forecasts on plans sampled from a grid-based MaxEntropy Inverse Reinforcement Learning policy. It introduces P2T, a three-component pipeline: a convolutional reward model that produces transient path and terminal goal rewards on a coarse 2-D grid, a reformulated MaxEnt IRL policy that jointly infers goals and paths from these rewards, and an attention-based trajectory generator that maps sampled plans and motion history to continuous future trajectories, which are then clustered into K representative predictions. By jointly inferring goals and plans and using plan-conditioned trajectory generation, the approach yields multimodal, scene-constrained forecasts with improved precision and diversity. Empirical results on Stanford Drone Dataset (SDD) and NuScenes demonstrate state-of-the-art or competitive performance across key metrics, with notably lower off-road and off-yaw rates, and real-time inference suitable for on-board deployment.

Abstract

We address the problem of forecasting pedestrian and vehicle trajectories in unknown environments, conditioned on their past motion and scene structure. Trajectory forecasting is a challenging problem due to the large variation in scene structure and the multimodal distribution of future trajectories. Unlike prior approaches that directly learn one-to-many mappings from observed context to multiple future trajectories, we propose to condition trajectory forecasts on plans sampled from a grid based policy learned using maximum entropy inverse reinforcement learning (MaxEnt IRL). We reformulate MaxEnt IRL to allow the policy to jointly infer plausible agent goals, and paths to those goals on a coarse 2-D grid defined over the scene. We propose an attention based trajectory generator that generates continuous valued future trajectories conditioned on state sequences sampled from the MaxEnt policy. Quantitative and qualitative evaluation on the publicly available Stanford drone and NuScenes datasets shows that our model generates trajectories that are diverse, representing the multimodal predictive distribution, and precise, conforming to the underlying scene structure over long prediction horizons.

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

TL;DR

This work tackles trajectory forecasting in unknown environments by conditioning forecasts on plans sampled from a grid-based MaxEntropy Inverse Reinforcement Learning policy. It introduces P2T, a three-component pipeline: a convolutional reward model that produces transient path and terminal goal rewards on a coarse 2-D grid, a reformulated MaxEnt IRL policy that jointly infers goals and paths from these rewards, and an attention-based trajectory generator that maps sampled plans and motion history to continuous future trajectories, which are then clustered into K representative predictions. By jointly inferring goals and plans and using plan-conditioned trajectory generation, the approach yields multimodal, scene-constrained forecasts with improved precision and diversity. Empirical results on Stanford Drone Dataset (SDD) and NuScenes demonstrate state-of-the-art or competitive performance across key metrics, with notably lower off-road and off-yaw rates, and real-time inference suitable for on-board deployment.

Abstract

We address the problem of forecasting pedestrian and vehicle trajectories in unknown environments, conditioned on their past motion and scene structure. Trajectory forecasting is a challenging problem due to the large variation in scene structure and the multimodal distribution of future trajectories. Unlike prior approaches that directly learn one-to-many mappings from observed context to multiple future trajectories, we propose to condition trajectory forecasts on plans sampled from a grid based policy learned using maximum entropy inverse reinforcement learning (MaxEnt IRL). We reformulate MaxEnt IRL to allow the policy to jointly infer plausible agent goals, and paths to those goals on a coarse 2-D grid defined over the scene. We propose an attention based trajectory generator that generates continuous valued future trajectories conditioned on state sequences sampled from the MaxEnt policy. Quantitative and qualitative evaluation on the publicly available Stanford drone and NuScenes datasets shows that our model generates trajectories that are diverse, representing the multimodal predictive distribution, and precise, conforming to the underlying scene structure over long prediction horizons.

Paper Structure

This paper contains 14 sections, 12 equations, 8 figures, 6 tables, 4 algorithms.

Figures (8)

  • Figure 1: Forecasts generated by P2T: We address the problem of forecasting agent trajectories in unknown environments. The inputs to our model (left) are snippets of the agents' past trajectories, and a bird's eye view representation of the scene around them. Our model infers potential goals of the agents (left-middle) and paths to these goals (middle) over a coarse 2-D grid defined over the scene by modeling the agent as a MaxEnt policy exploring the grid. It generates continuous valued trajectories conditioned on the grid-based plans sampled from the policy (middle-right). Finally it outputs K predicted trajectories by clustering the sampled trajectories (right).
  • Figure 2: Overview: P2T consists of three modules: (1) a fully convolutional reward model, that outputs transient path state rewards and terminal goal state rewards on a coarse 2-D grid, (2) a MaxEnt RL policy for the learned path and state rewards, that can be sampled to generate multimodal plans on the 2-D grid, and (3) an attention based trajectory generator, that outputs continuous valued trajectories conditioned on the sampled plans.
  • Figure 3: Reward model: $\hbox{CNN}_{feat}$ extracts features from the static scene. We concatenate these with feature maps capturing the agent's motion. $\hbox{CNN}_p$ and $\hbox{CNN}_g$ learn path and goal rewards from the features.
  • Figure 4: Plan encoder: For each state in a sampled plan, we encode the scene features, surrounding agent states and the location co-ordinates of the grid cell and term it $\phi_{S}(s)$. This is then fed into bidirectional GRU to encode the the entire sampled plan. Our GRU decoder generates output trajectories by attending to the plan encoding.
  • Figure 5: Sample quality metrics. MinADE$_K$, MinFDE$_K$ and miss rate fail to penalize a diverse set of trajectories that don't conform to the scene (left). The off-road rate (middle) and off-yaw (right) metrics address this by penalizing predicted points that fall off the drivable area or onto oncoming traffic. Warm colors indicate higher errors.
  • ...and 3 more figures