DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception
Xiang Huang, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Baigui Sun
TL;DR
This paper addresses the challenge of achieving low-latency yet high-accuracy streaming perception in autonomous driving by introducing DyRoNet, a dynamic routing framework that selects among a bank of environment-tuned streaming perception branches. A speed router uses frame-difference signals to infer environmental velocity and route inputs to the most suitable branch, while Low-Rank Adaptation (LoRA) enables efficient sub-model adaptation without full fine-tuning. The training objective combines a streaming perception loss with an effective-and-efficient loss, guiding both branch accuracy and router efficiency; experiments on Argoverse-HD (and NuScenes-H) show superior sAP at competitive inference times compared to state-of-the-art methods. The results establish a new benchmark for streaming perception and provide practical insights into dynamic model selection, router design, and LoRA-based adaptation for real-time autonomous driving systems.
Abstract
The advancement of autonomous driving systems hinges on the ability to achieve low-latency and high-accuracy perception. To address this critical need, this paper introduces Dynamic Routing Network (DyRoNet), a low-rank enhanced dynamic routing framework designed for streaming perception in autonomous driving systems. DyRoNet integrates a suite of pre-trained branch networks, each meticulously fine-tuned to function under distinct environmental conditions. At its core, the framework offers a speed router module, developed to assess and route input data to the most suitable branch for processing. This approach not only addresses the inherent limitations of conventional models in adapting to diverse driving conditions but also ensures the balance between performance and efficiency. Extensive experimental evaluations demonstrate the adaptability of DyRoNet to diverse branch selection strategies, resulting in significant performance enhancements across different scenarios. This work establishes a new benchmark for streaming perception and provides valuable engineering insights for future work.
