Table of Contents
Fetching ...

Nav-EE: Navigation-Guided Early Exiting for Efficient Vision-Language Models in Autonomous Driving

Haibo Hu, Lianming Huang, Xinyu Wang, Yufei Cui, Shangyu Wu, Nan Guan, Chun Jason Xue

TL;DR

Nav-EE, a navigation-guided early-exit framework that precomputes task-specific exit layers offline and dynamically applies them online based on navigation priors is proposed, suggesting that coupling navigation foresight with early-exit offers a viable path toward efficient deployment of large models in autonomous systems.

Abstract

Vision-Language Models (VLMs) are increasingly applied in autonomous driving for unified perception and reasoning, but high inference latency hinders real-time deployment. Early-exit reduces latency by terminating inference at intermediate layers, yet its task-dependent nature limits generalization across diverse scenarios. We observe that this limitation aligns with autonomous driving: navigation systems can anticipate upcoming contexts (e.g., intersections, traffic lights), indicating which tasks will be required. We propose Nav-EE, a navigation-guided early-exit framework that precomputes task-specific exit layers offline and dynamically applies them online based on navigation priors. Experiments on CODA, Waymo, and BOSCH show that Nav-EE achieves accuracy comparable to full inference while reducing latency by up to 63.9%. Real-vehicle integration with Autoware Universe further demonstrates reduced inference latency (600ms to 300ms), supporting faster decision-making in complex scenarios. These results suggest that coupling navigation foresight with early-exit offers a viable path toward efficient deployment of large models in autonomous systems. Code and data are available at our anonymous repository: https://anonymous.4open.science/r/Nav-EE-BBC4

Nav-EE: Navigation-Guided Early Exiting for Efficient Vision-Language Models in Autonomous Driving

TL;DR

Nav-EE, a navigation-guided early-exit framework that precomputes task-specific exit layers offline and dynamically applies them online based on navigation priors is proposed, suggesting that coupling navigation foresight with early-exit offers a viable path toward efficient deployment of large models in autonomous systems.

Abstract

Vision-Language Models (VLMs) are increasingly applied in autonomous driving for unified perception and reasoning, but high inference latency hinders real-time deployment. Early-exit reduces latency by terminating inference at intermediate layers, yet its task-dependent nature limits generalization across diverse scenarios. We observe that this limitation aligns with autonomous driving: navigation systems can anticipate upcoming contexts (e.g., intersections, traffic lights), indicating which tasks will be required. We propose Nav-EE, a navigation-guided early-exit framework that precomputes task-specific exit layers offline and dynamically applies them online based on navigation priors. Experiments on CODA, Waymo, and BOSCH show that Nav-EE achieves accuracy comparable to full inference while reducing latency by up to 63.9%. Real-vehicle integration with Autoware Universe further demonstrates reduced inference latency (600ms to 300ms), supporting faster decision-making in complex scenarios. These results suggest that coupling navigation foresight with early-exit offers a viable path toward efficient deployment of large models in autonomous systems. Code and data are available at our anonymous repository: https://anonymous.4open.science/r/Nav-EE-BBC4

Paper Structure

This paper contains 22 sections, 5 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Full-layer inference processes all transformer layers, while generic EE methods stop once predictions stabilize. With navigation priors, VLMs in autonomous driving can adopt more aggressive exits, avoiding redundant computation while preserving accuracy.
  • Figure 2: Layer-wise predictions of LLaVA-7B on CODA object recognition, showing early stabilization at the correct label (‘car’).
  • Figure 3: Navigation priors in autonomous driving. HD maps predict upcoming contexts (e.g., traffic-light zones, intersections, pedestrian areas), which Nav-EE uses to trigger task-specific early exits.
  • Figure 4: Overview of Nav-EE: offline profiling identifies task-specific exit layers, which are dynamically triggered by navigation priors for efficient inference in autonomous driving.
  • Figure 5: Navigation-aware setup: using Lanelet2 and the ROS2 /map/vector_map topic to anticipate upcoming traffic lights, enabling VLMs to switch to traffic-light task configurations.
  • ...and 1 more figures