Table of Contents
Fetching ...

LongComp: Long-Tail Compositional Zero-Shot Generalization for Robust Trajectory Prediction

Benjamin Stoler, Jonathan Francis, Jean Oh

TL;DR

The paper tackles robust trajectory prediction under rare safety-critical scenarios by introducing a safety-informed, long-tail compositional zero-shot evaluation framework that factorizes driving scenarios into ego and social contexts and creates closed-world and open-world OOD test sets. It extends CZSL concepts to autonomous driving, and develops two generalization techniques—a TMN-inspired task-modular gating mechanism and an auxiliary difficulty-prediction head—that operate on a bottleneck representation to improve OOD performance. Using the Waymo Open Motion Dataset, the authors quantify baseline OOD gaps of $5.0\%$ (closed-world) and $14.7\%$ (open-world), and demonstrate reductions to $2.8\%$ and $11.5\%$ with their methods, while also achieving modest in-distribution gains. This work provides a structured, interpretable framework for evaluating and enhancing robust motion prediction under long-tail conditions, with implications for safer autonomous-driving systems and broader compositional generalization research.

Abstract

Methods for trajectory prediction in Autonomous Driving must contend with rare, safety-critical scenarios that make reliance on real-world data collection alone infeasible. To assess robustness under such conditions, we propose new long-tail evaluation settings that repartition datasets to create challenging out-of-distribution (OOD) test sets. We first introduce a safety-informed scenario factorization framework, which disentangles scenarios into discrete ego and social contexts. Building on analogies to compositional zero-shot image-labeling in Computer Vision, we then hold out novel context combinations to construct challenging closed-world and open-world settings. This process induces OOD performance gaps in future motion prediction of 5.0% and 14.7% in closed-world and open-world settings, respectively, relative to in-distribution performance for a state-of-the-art baseline. To improve generalization, we extend task-modular gating networks to operate within trajectory prediction models, and develop an auxiliary, difficulty-prediction head to refine internal representations. Our strategies jointly reduce the OOD performance gaps to 2.8% and 11.5% in the two settings, respectively, while still improving in-distribution performance.

LongComp: Long-Tail Compositional Zero-Shot Generalization for Robust Trajectory Prediction

TL;DR

The paper tackles robust trajectory prediction under rare safety-critical scenarios by introducing a safety-informed, long-tail compositional zero-shot evaluation framework that factorizes driving scenarios into ego and social contexts and creates closed-world and open-world OOD test sets. It extends CZSL concepts to autonomous driving, and develops two generalization techniques—a TMN-inspired task-modular gating mechanism and an auxiliary difficulty-prediction head—that operate on a bottleneck representation to improve OOD performance. Using the Waymo Open Motion Dataset, the authors quantify baseline OOD gaps of (closed-world) and (open-world), and demonstrate reductions to and with their methods, while also achieving modest in-distribution gains. This work provides a structured, interpretable framework for evaluating and enhancing robust motion prediction under long-tail conditions, with implications for safer autonomous-driving systems and broader compositional generalization research.

Abstract

Methods for trajectory prediction in Autonomous Driving must contend with rare, safety-critical scenarios that make reliance on real-world data collection alone infeasible. To assess robustness under such conditions, we propose new long-tail evaluation settings that repartition datasets to create challenging out-of-distribution (OOD) test sets. We first introduce a safety-informed scenario factorization framework, which disentangles scenarios into discrete ego and social contexts. Building on analogies to compositional zero-shot image-labeling in Computer Vision, we then hold out novel context combinations to construct challenging closed-world and open-world settings. This process induces OOD performance gaps in future motion prediction of 5.0% and 14.7% in closed-world and open-world settings, respectively, relative to in-distribution performance for a state-of-the-art baseline. To improve generalization, we extend task-modular gating networks to operate within trajectory prediction models, and develop an auxiliary, difficulty-prediction head to refine internal representations. Our strategies jointly reduce the OOD performance gaps to 2.8% and 11.5% in the two settings, respectively, while still improving in-distribution performance.

Paper Structure

This paper contains 13 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overview of our framework. (a) Traffic scenarios are factorized and clustered along explicitly disentangled ego and social axes. (b) These contexts are then used to create challenging compositional zero-shot evaluation settings for trajectory prediction, and to enable generalization strategies to enhance OOD robustness.
  • Figure 2: UMAP mcinnes2018umap visualizations for ego and social contexts across agent types. Colors correspond to discretized context labels from clustering described in \ref{['ssec:discretization']}, independently per diagram.
  • Figure 3: Cluster examples, by context and agent behavior types. In each subgrid of 4 examples, one context (e.g., ego) is fixed while the paired context (e.g., social) varies, with a brief caption describing the shared behavior. Agent markers show positions at $T_\text{hist}$; the focal agent is shown as a large, open circle, while background agents are smaller and filled.