LongComp: Long-Tail Compositional Zero-Shot Generalization for Robust Trajectory Prediction
Benjamin Stoler, Jonathan Francis, Jean Oh
TL;DR
The paper tackles robust trajectory prediction under rare safety-critical scenarios by introducing a safety-informed, long-tail compositional zero-shot evaluation framework that factorizes driving scenarios into ego and social contexts and creates closed-world and open-world OOD test sets. It extends CZSL concepts to autonomous driving, and develops two generalization techniques—a TMN-inspired task-modular gating mechanism and an auxiliary difficulty-prediction head—that operate on a bottleneck representation to improve OOD performance. Using the Waymo Open Motion Dataset, the authors quantify baseline OOD gaps of $5.0\%$ (closed-world) and $14.7\%$ (open-world), and demonstrate reductions to $2.8\%$ and $11.5\%$ with their methods, while also achieving modest in-distribution gains. This work provides a structured, interpretable framework for evaluating and enhancing robust motion prediction under long-tail conditions, with implications for safer autonomous-driving systems and broader compositional generalization research.
Abstract
Methods for trajectory prediction in Autonomous Driving must contend with rare, safety-critical scenarios that make reliance on real-world data collection alone infeasible. To assess robustness under such conditions, we propose new long-tail evaluation settings that repartition datasets to create challenging out-of-distribution (OOD) test sets. We first introduce a safety-informed scenario factorization framework, which disentangles scenarios into discrete ego and social contexts. Building on analogies to compositional zero-shot image-labeling in Computer Vision, we then hold out novel context combinations to construct challenging closed-world and open-world settings. This process induces OOD performance gaps in future motion prediction of 5.0% and 14.7% in closed-world and open-world settings, respectively, relative to in-distribution performance for a state-of-the-art baseline. To improve generalization, we extend task-modular gating networks to operate within trajectory prediction models, and develop an auxiliary, difficulty-prediction head to refine internal representations. Our strategies jointly reduce the OOD performance gaps to 2.8% and 11.5% in the two settings, respectively, while still improving in-distribution performance.
