Table of Contents
Fetching ...

Online Aggregation of Trajectory Predictors

Alex Tong, Apoorva Sharma, Sushant Veer, Marco Pavone, Heng Yang

TL;DR

The paper tackles distribution shifts in trajectory forecasting by proposing an online, model-agnostic mixture-of-experts framework that aggregates multiple predictors as black-box experts. It formulates the update of the mixture weights $\alpha \in \Delta_N$ within an online convex optimization setting, using the squint algorithm to achieve fast adaptation and sublinear regret, with the loss based on the true next-state observation $x_t$. It extends the approach to nonconvex objectives via differentiable surrogates (softsort/softmin) and handles nonstationary environments with a discount factor to forget outdated information. Empirically, pretrained predictors on Boston, Singapore, and Las Vegas are fused and evaluated on Pittsburgh and Lyft, showing that the online MoE can match or surpass the best single predictor and adapt across OOD conditions. Overall, the method provides a practical, scalable way to enhance robustness in multimodal trajectory prediction for autonomous driving.

Abstract

Trajectory prediction, the task of forecasting future agent behavior from past data, is central to safe and efficient autonomous driving. A diverse set of methods (e.g., rule-based or learned with different architectures and datasets) have been proposed, yet it is often the case that the performance of these methods is sensitive to the deployment environment (e.g., how well the design rules model the environment, or how accurately the test data match the training data). Building upon the principled theory of online convex optimization but also going beyond convexity and stationarity, we present a lightweight and model-agnostic method to aggregate different trajectory predictors online. We propose treating each individual trajectory predictor as an "expert" and maintaining a probability vector to mix the outputs of different experts. Then, the key technical approach lies in leveraging online data -- the true agent behavior to be revealed at the next timestep -- to form a convex-or-nonconvex, stationary-or-dynamic loss function whose gradient steers the probability vector towards choosing the best mixture of experts. We instantiate this method to aggregate trajectory predictors trained on different cities in the NUSCENES dataset and show that it performs just as well, if not better than, any singular model, even when deployed on the out-of-distribution LYFT dataset.

Online Aggregation of Trajectory Predictors

TL;DR

The paper tackles distribution shifts in trajectory forecasting by proposing an online, model-agnostic mixture-of-experts framework that aggregates multiple predictors as black-box experts. It formulates the update of the mixture weights within an online convex optimization setting, using the squint algorithm to achieve fast adaptation and sublinear regret, with the loss based on the true next-state observation . It extends the approach to nonconvex objectives via differentiable surrogates (softsort/softmin) and handles nonstationary environments with a discount factor to forget outdated information. Empirically, pretrained predictors on Boston, Singapore, and Las Vegas are fused and evaluated on Pittsburgh and Lyft, showing that the online MoE can match or surpass the best single predictor and adapt across OOD conditions. Overall, the method provides a practical, scalable way to enhance robustness in multimodal trajectory prediction for autonomous driving.

Abstract

Trajectory prediction, the task of forecasting future agent behavior from past data, is central to safe and efficient autonomous driving. A diverse set of methods (e.g., rule-based or learned with different architectures and datasets) have been proposed, yet it is often the case that the performance of these methods is sensitive to the deployment environment (e.g., how well the design rules model the environment, or how accurately the test data match the training data). Building upon the principled theory of online convex optimization but also going beyond convexity and stationarity, we present a lightweight and model-agnostic method to aggregate different trajectory predictors online. We propose treating each individual trajectory predictor as an "expert" and maintaining a probability vector to mix the outputs of different experts. Then, the key technical approach lies in leveraging online data -- the true agent behavior to be revealed at the next timestep -- to form a convex-or-nonconvex, stationary-or-dynamic loss function whose gradient steers the probability vector towards choosing the best mixture of experts. We instantiate this method to aggregate trajectory predictors trained on different cities in the NUSCENES dataset and show that it performs just as well, if not better than, any singular model, even when deployed on the out-of-distribution LYFT dataset.

Paper Structure

This paper contains 15 sections, 16 equations, 7 figures, 2 algorithms.

Figures (7)

  • Figure 1: Online aggregation of trajectory predictors.
  • Figure 2: Performance of MoE in a stationary distribution shift (from the Pittsburgh environment) using (a) convex loss and (b) nonconvex loss. From top to bottom: NLL performance of the MoE compared with the singular models; minADE performance of the MoE compared with the singular models; evolution of the probability vector. minFDE performance is shown in Appendix \ref{['app:experiments']}.
  • Figure 3: Performance of MoE in a stationary distribution shift (from the Lyft dataset) using (a) convex loss and (b) nonconvex loss. From top to bottom: NLL performance of the MoE compared with the singular models; minADE performance of the MoE compared with the singular models; evolution of the probability vector.
  • Figure 4: Performance of MoE in nonstationary distribution shifts using (a) convex loss and (b) nonconvex loss. From top to bottom: NLL performance of the MoE compared with the singular models; minADE performance of the MoE compared with the singular models; evolution of the probability vector. minFDE performance is shown in Appendix \ref{['app:experiments']}.
  • Figure 5: squint vs EG on the Las Vegas dataset.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Remark 1: Generality