Dynamic Model Selection for Trajectory Prediction via Pairwise Ranking and Meta-Features
Lu Bowen
TL;DR
This work tackles the reliability gap in autonomous-vehicle trajectory prediction by introducing a dynamic, multi-expert gating system that selects the most reliable predictor on a per-scene basis. The gate uses a 36-dimensional meta-feature vector encoding model-internal signals (uncertainty, stability, physics-violations) and is trained as a pairwise ranking problem, enabling robust per-sample selection without calibration. An LLM-based semantic supervisor provides risk-aware overrides for high-conflict cases, achieving $FDE = 2.567\,\mathrm{m}$, a $9.5\%$ improvement over the best single expert, and realizing $57.8\%$ of the oracle gap on nuPlan-mini. Across offline and open-loop evaluations, the approach yields substantial gains in long-tail and safety-critical scenarios, highlighting the practical potential of adaptive hybrid systems for autonomous driving. The study also discusses deployment trade-offs and limitations, including latency and dataset generalization, suggesting avenues for future work such as broader expert pools and distillation-based latency reductions.
Abstract
Recent deep trajectory predictors (e.g., Jiang et al., 2023; Zhou et al., 2022) have achieved strong average accuracy but remain unreliable in complex long-tail driving scenarios. These limitations reveal the weakness of the prevailing "one-model-fits-all" paradigm, particularly in safety-critical urban contexts where simpler physics-based models can occasionally outperform advanced networks (Kalman, 1960). To bridge this gap, we propose a dynamic multi-expert gating framework that adaptively selects the most reliable trajectory predictor among a physics-informed LSTM, a Transformer, and a fine-tuned GameFormer on a per-sample basis. Our method leverages internal model signals (meta-features) such as stability and uncertainty (Gal and Ghahramani, 2016), which we demonstrate to be substantially more informative than geometric scene descriptors. To the best of our knowledge, this is the first work to formulate trajectory expert selection as a pairwise-ranking problem over internal model signals (Burges et al., 2005), directly optimizing decision quality without requiring post-hoc calibration. Evaluated on the nuPlan-mini dataset (Caesar et al., 2021) with 1,287 samples, our LLM-enhanced tri-expert gate achieves a Final Displacement Error (FDE) of 2.567 m, representing a 9.5 percent reduction over GameFormer (2.835 m), and realizes 57.8 percent of the oracle performance bound. In open-loop simulations, after trajectory horizon alignment, the same configuration reduces FDE on left-turn scenarios by approximately 10 percent, demonstrating consistent improvements across both offline validation and open-loop evaluation. These results indicate that adaptive hybrid systems enhance trajectory reliability in safety-critical autonomous driving, providing a practical pathway beyond static single-model paradigms.
