Divide and Merge: Motion and Semantic Learning in End-to-End Autonomous Driving
Yinzhe Shen, Omer Sahin Tas, Kaiwen Wang, Royden Wagner, Christoph Stiller
TL;DR
This paper tackles negative transfer in end-to-end autonomous driving by decoupling semantic and motion learning through DMAD, which deploys a Neural-Bayes motion decoder and an Interactive Semantic Decoder that share reference points but propagate gradients separately. The approach enables concurrent perception, tracking, and prediction while fostering mutual semantic exchange between object and map perception, leading to improvements across perception, prediction, and planning on nuScenes when integrated with UniAD and SparseDrive. Key contributions include the decoupled motion learning via Bayes-filter–inspired recursion, bidirectional semantic interaction, and comprehensive ablations with SHAP-based insights. The results demonstrate that dividing and then merging heterogeneous information yields superior downstream planning performance and safety metrics, with practical impact for robust, end-to-end autonomous driving systems.
Abstract
Perceiving the environment and its changes over time corresponds to two fundamental yet heterogeneous types of information: semantics and motion. Previous end-to-end autonomous driving works represent both types of information in a single feature vector. However, including motion related tasks, such as prediction and planning, impairs detection and tracking performance, a phenomenon known as negative transfer in multi-task learning. To address this issue, we propose Neural-Bayes motion decoding, a novel parallel detection, tracking, and prediction method that separates semantic and motion learning. Specifically, we employ a set of learned motion queries that operate in parallel with detection and tracking queries, sharing a unified set of recursively updated reference points. Moreover, we employ interactive semantic decoding to enhance information exchange in semantic tasks, promoting positive transfer. Experiments on the nuScenes dataset with UniAD and SparseDrive confirm the effectiveness of our divide and merge approach, resulting in performance improvements across perception, prediction, and planning. Our code is available at https://github.com/shenyinzhe/DMAD.
