Table of Contents
Fetching ...

DECODE: Domain-aware Continual Domain Expansion for Motion Prediction

Boqi Li, Haojie Zhu, Henry X. Liu

TL;DR

DECODE is introduced, a novel continual learning framework that begins with a pre-trained generalized model and incrementally develops specialized models for distinct domains, and merges outputs from the most relevant specialized and generalized models using deep Bayesian uncertainty estimation techniques.

Abstract

Motion prediction is critical for autonomous vehicles to effectively navigate complex environments and accurately anticipate the behaviors of other traffic participants. As autonomous driving continues to evolve, the need to assimilate new and varied driving scenarios necessitates frequent model updates through retraining. To address these demands, we introduce DECODE, a novel continual learning framework that begins with a pre-trained generalized model and incrementally develops specialized models for distinct domains. Unlike existing continual learning approaches that attempt to develop a unified model capable of generalizing across diverse scenarios, DECODE uniquely balances specialization with generalization, dynamically adjusting to real-time demands. The proposed framework leverages a hypernetwork to generate model parameters, significantly reducing storage requirements, and incorporates a normalizing flow mechanism for real-time model selection based on likelihood estimation. Furthermore, DECODE merges outputs from the most relevant specialized and generalized models using deep Bayesian uncertainty estimation techniques. This integration ensures optimal performance in familiar conditions while maintaining robustness in unfamiliar scenarios. Extensive evaluations confirm the effectiveness of the framework, achieving a notably low forgetting rate of 0.044 and an average minADE of 0.584 m, significantly surpassing traditional learning strategies and demonstrating adaptability across a wide range of driving conditions.

DECODE: Domain-aware Continual Domain Expansion for Motion Prediction

TL;DR

DECODE is introduced, a novel continual learning framework that begins with a pre-trained generalized model and incrementally develops specialized models for distinct domains, and merges outputs from the most relevant specialized and generalized models using deep Bayesian uncertainty estimation techniques.

Abstract

Motion prediction is critical for autonomous vehicles to effectively navigate complex environments and accurately anticipate the behaviors of other traffic participants. As autonomous driving continues to evolve, the need to assimilate new and varied driving scenarios necessitates frequent model updates through retraining. To address these demands, we introduce DECODE, a novel continual learning framework that begins with a pre-trained generalized model and incrementally develops specialized models for distinct domains. Unlike existing continual learning approaches that attempt to develop a unified model capable of generalizing across diverse scenarios, DECODE uniquely balances specialization with generalization, dynamically adjusting to real-time demands. The proposed framework leverages a hypernetwork to generate model parameters, significantly reducing storage requirements, and incorporates a normalizing flow mechanism for real-time model selection based on likelihood estimation. Furthermore, DECODE merges outputs from the most relevant specialized and generalized models using deep Bayesian uncertainty estimation techniques. This integration ensures optimal performance in familiar conditions while maintaining robustness in unfamiliar scenarios. Extensive evaluations confirm the effectiveness of the framework, achieving a notably low forgetting rate of 0.044 and an average minADE of 0.584 m, significantly surpassing traditional learning strategies and demonstrating adaptability across a wide range of driving conditions.

Paper Structure

This paper contains 16 sections, 19 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of DECODE framework: (a) The initial dataset updates the hypernetwork and creates a domain query to generate parameters for specialized model 1. (b) During inference, scenarios similar to the first domain achieve high likelihood scores from the normalizing flow, prompting the use of specialized model 1; those outside the domain have low likelihoods, leading to the use of the generalized model. (c) A new dataset is available, creating another domain query while ensuring consistent parameters for previous queries. (d) During subsequent inference, previously unfamiliar scenarios that now gain higher likelihood scores will utilize specialized model 2.
  • Figure 2: Overall Framework of DECODE: A set of domain queries serve as inputs to a hypernetwork, which dynamically generates parameters for both the normalizing flow and the decoder. The normalizing flow models, parameterized differently for each domain, generate distributions for domain-specific features. These distributions help identify the most suitable specialized model for each scene, based on the hidden representations produced by the encoder from the input space. The selected specialized decoder then generates future motion predictions for that scene. Outputs from the generalized model are integrated to ensure that the final prediction remains reliable and within performance bounds.
  • Figure 3: Modified Architecture of MTR: (a) Scene Encoder processes scene input $x$ and outputs hidden representations for vehicles and lanes. (b) Hypernetwork receives a domain query $q$ and generates parameters for (d) the specialized decoder and (e) the specialized normalizing flow. (c) Generalized decoder, like (a), is pre-trained and fixed to ensure stability. Details of (a) and (b) are omitted to focus on the decoding process. During decoding, outputs from (c) feed into (d), functioning as an additional layer in the transformer decoder. With the evidence $e$ from (e), and the parameters $\chi_{[0]}$ from (c) and $\chi_{[m^*]}$ from (d), the posterior parameters are updated, ultimately leading to the sampling of the predicted future motion.
  • Figure 4: Structure of the chunked hypernetwork: The hypernetwork is designed to reuse its model structure by incorporating two distinct sets of chunk embeddings, $\{b_\omega^i\}_{i=1}^{N_\omega}$ for the normalizing flow and $\{b_\theta^i\}_{i=1}^{N_\theta}$ for the decoder. Each domain query $q$ is transformed and then concatenated with each set of chunk embeddings before being fed into the hypernetwork. The resulting outputs are $\Theta_\omega=\{\Theta_\omega^i\}_{i=1}^{N_\omega}$ and $\Theta_\omega=\{\Theta_\theta^i\}_{i=1}^{N_\theta}$, which respectively form the parameters for the normalizing flow and the decoder, as depicted in the arrows leading from the outputs of the hypernetwork to their respective application modules.
  • Figure 5: Epoch vs. minADE on Validation Sets in the DECODE Framework Across Three Learning Phases: Solid lines represent the performance of the proposed DECODE framework, while dashed lines indicate the benchmark performance of the pre-trained generalized model. The color coding is used to distinguish between domains: red for the RounD (D1) domain, blue for the HighD (D2) domain, and green for the InD (D3) domain.
  • ...and 4 more figures