Table of Contents
Fetching ...

Dynamic Model Selection for Trajectory Prediction via Pairwise Ranking and Meta-Features

Lu Bowen

TL;DR

This work tackles the reliability gap in autonomous-vehicle trajectory prediction by introducing a dynamic, multi-expert gating system that selects the most reliable predictor on a per-scene basis. The gate uses a 36-dimensional meta-feature vector encoding model-internal signals (uncertainty, stability, physics-violations) and is trained as a pairwise ranking problem, enabling robust per-sample selection without calibration. An LLM-based semantic supervisor provides risk-aware overrides for high-conflict cases, achieving $FDE = 2.567\,\mathrm{m}$, a $9.5\%$ improvement over the best single expert, and realizing $57.8\%$ of the oracle gap on nuPlan-mini. Across offline and open-loop evaluations, the approach yields substantial gains in long-tail and safety-critical scenarios, highlighting the practical potential of adaptive hybrid systems for autonomous driving. The study also discusses deployment trade-offs and limitations, including latency and dataset generalization, suggesting avenues for future work such as broader expert pools and distillation-based latency reductions.

Abstract

Recent deep trajectory predictors (e.g., Jiang et al., 2023; Zhou et al., 2022) have achieved strong average accuracy but remain unreliable in complex long-tail driving scenarios. These limitations reveal the weakness of the prevailing "one-model-fits-all" paradigm, particularly in safety-critical urban contexts where simpler physics-based models can occasionally outperform advanced networks (Kalman, 1960). To bridge this gap, we propose a dynamic multi-expert gating framework that adaptively selects the most reliable trajectory predictor among a physics-informed LSTM, a Transformer, and a fine-tuned GameFormer on a per-sample basis. Our method leverages internal model signals (meta-features) such as stability and uncertainty (Gal and Ghahramani, 2016), which we demonstrate to be substantially more informative than geometric scene descriptors. To the best of our knowledge, this is the first work to formulate trajectory expert selection as a pairwise-ranking problem over internal model signals (Burges et al., 2005), directly optimizing decision quality without requiring post-hoc calibration. Evaluated on the nuPlan-mini dataset (Caesar et al., 2021) with 1,287 samples, our LLM-enhanced tri-expert gate achieves a Final Displacement Error (FDE) of 2.567 m, representing a 9.5 percent reduction over GameFormer (2.835 m), and realizes 57.8 percent of the oracle performance bound. In open-loop simulations, after trajectory horizon alignment, the same configuration reduces FDE on left-turn scenarios by approximately 10 percent, demonstrating consistent improvements across both offline validation and open-loop evaluation. These results indicate that adaptive hybrid systems enhance trajectory reliability in safety-critical autonomous driving, providing a practical pathway beyond static single-model paradigms.

Dynamic Model Selection for Trajectory Prediction via Pairwise Ranking and Meta-Features

TL;DR

This work tackles the reliability gap in autonomous-vehicle trajectory prediction by introducing a dynamic, multi-expert gating system that selects the most reliable predictor on a per-scene basis. The gate uses a 36-dimensional meta-feature vector encoding model-internal signals (uncertainty, stability, physics-violations) and is trained as a pairwise ranking problem, enabling robust per-sample selection without calibration. An LLM-based semantic supervisor provides risk-aware overrides for high-conflict cases, achieving , a improvement over the best single expert, and realizing of the oracle gap on nuPlan-mini. Across offline and open-loop evaluations, the approach yields substantial gains in long-tail and safety-critical scenarios, highlighting the practical potential of adaptive hybrid systems for autonomous driving. The study also discusses deployment trade-offs and limitations, including latency and dataset generalization, suggesting avenues for future work such as broader expert pools and distillation-based latency reductions.

Abstract

Recent deep trajectory predictors (e.g., Jiang et al., 2023; Zhou et al., 2022) have achieved strong average accuracy but remain unreliable in complex long-tail driving scenarios. These limitations reveal the weakness of the prevailing "one-model-fits-all" paradigm, particularly in safety-critical urban contexts where simpler physics-based models can occasionally outperform advanced networks (Kalman, 1960). To bridge this gap, we propose a dynamic multi-expert gating framework that adaptively selects the most reliable trajectory predictor among a physics-informed LSTM, a Transformer, and a fine-tuned GameFormer on a per-sample basis. Our method leverages internal model signals (meta-features) such as stability and uncertainty (Gal and Ghahramani, 2016), which we demonstrate to be substantially more informative than geometric scene descriptors. To the best of our knowledge, this is the first work to formulate trajectory expert selection as a pairwise-ranking problem over internal model signals (Burges et al., 2005), directly optimizing decision quality without requiring post-hoc calibration. Evaluated on the nuPlan-mini dataset (Caesar et al., 2021) with 1,287 samples, our LLM-enhanced tri-expert gate achieves a Final Displacement Error (FDE) of 2.567 m, representing a 9.5 percent reduction over GameFormer (2.835 m), and realizes 57.8 percent of the oracle performance bound. In open-loop simulations, after trajectory horizon alignment, the same configuration reduces FDE on left-turn scenarios by approximately 10 percent, demonstrating consistent improvements across both offline validation and open-loop evaluation. These results indicate that adaptive hybrid systems enhance trajectory reliability in safety-critical autonomous driving, providing a practical pathway beyond static single-model paradigms.

Paper Structure

This paper contains 60 sections, 2 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Hybrid gating architecture. A tri-expert ensemble provides candidate trajectories whose internal signals feed the meta-feature extractor; the ranking gate handles most scenes, while an LLM supervisor issues semantic overrides on difficult, high-risk cases.
  • Figure 2: Long-tail scenario performance on 99 high-risk scenes where the baseline GameFormer exhibits severe failures. Our LLM-enhanced gate achieves substantial error reductions across all critical scenarios: 60.4% in intersections, 44.9% in cut-ins, 94.7% in high-speed maneuvers, and 53.0% in occlusions.