Multi-modal Integrated Prediction and Decision-making with Adaptive Interaction Modality Explorations
Tong Li, Lu Zhang, Sikang Liu, Shaojie Shen
TL;DR
This work tackles prediction and planning for autonomous driving in dense, dynamic traffic by introducing MIND, a framework that jointly predicts scene-level futures and ego decisions using a transformer-based predictor and Gaussian mixture models, combined with Adaptive Interaction Modality Exploration (AIME) to build a scenario tree. AIME dynamically branches the scene tree based on uncertainty variation, then pruning/merging via interaction modalities to keep the tree compact. Contingency planning operates on the resulting scenario trees to produce trajectory trees that optimize under multi-modal evolutions and safety constraints. Evaluations on the Argoverse 2 dataset demonstrate superior performance in both open-loop predictions and closed-loop driving simulations compared with strong baselines, indicating practical potential for reliable, interactive autonomous driving in complex environments.
Abstract
Navigating dense and dynamic environments poses a significant challenge for autonomous driving systems, owing to the intricate nature of multimodal interaction, wherein the actions of various traffic participants and the autonomous vehicle are complex and implicitly coupled. In this paper, we propose a novel framework, Multi-modal Integrated predictioN and Decision-making (MIND), which addresses the challenges by efficiently generating joint predictions and decisions covering multiple distinctive interaction modalities. Specifically, MIND leverages learning-based scenario predictions to obtain integrated predictions and decisions with social-consistent interaction modality and utilizes a modality-aware dynamic branching mechanism to generate scenario trees that efficiently capture the evolutions of distinctive interaction modalities with low variation of interaction uncertainty along the planning horizon. The scenario trees are seamlessly utilized by the contingency planning under interaction uncertainty to obtain clear and considerate maneuvers accounting for multi-modal evolutions. Comprehensive experimental results in the closed-loop simulation based on the real-world driving dataset showcase superior performance to other strong baselines under various driving contexts.
