Table of Contents
Fetching ...

EdgeNav-QE: QLoRA Quantization and Dynamic Early Exit for LAM-based Navigation on Edge Devices

Mengyun Liu, Shanshan Huang, Jianan Jiang

TL;DR

EdgeNav-QE, a novel framework that integrates Quantized Low-Rank Adaptation with a dynamic early-exit (DEE) mechanism to optimize LAMs for real-time edge navigation, is proposed, enabling the model to terminate inference early for simple navigation tasks while retaining full depth for complex decision-making.

Abstract

Large Action Models (LAMs) have shown immense potential in autonomous navigation by bridging high-level reasoning with low-level control. However, deploying these multi-billion parameter models on edge devices remains a significant challenge due to memory constraints and latency requirements. In this paper, we propose EdgeNav-QE, a novel framework that integrates Quantized Low-Rank Adaptation (QLoRA) with a dynamic early-exit (DEE) mechanism to optimize LAMs for real-time edge navigation. By quantizing the backbone to 4-bit precision and strategically placing early-exit branches, we enable the model to terminate inference early for simple navigation tasks while retaining full depth for complex decision-making. Experimental results on the Habitat-Sim environment with Matterport3D dataset using OpenVLA-7B backbone, demonstrate that EdgeNav-QE reduces inference latency by 82.7% and memory footprint by 66.7% compared to full-precision baselines, while maintaining 81.8% navigation success rate. Furthermore, it outperforms state-of-the-art static early-exit method by 17.9% in latency, demonstrating the superiority of content-aware adaptive computation for safety-critical applications.

EdgeNav-QE: QLoRA Quantization and Dynamic Early Exit for LAM-based Navigation on Edge Devices

TL;DR

EdgeNav-QE, a novel framework that integrates Quantized Low-Rank Adaptation with a dynamic early-exit (DEE) mechanism to optimize LAMs for real-time edge navigation, is proposed, enabling the model to terminate inference early for simple navigation tasks while retaining full depth for complex decision-making.

Abstract

Large Action Models (LAMs) have shown immense potential in autonomous navigation by bridging high-level reasoning with low-level control. However, deploying these multi-billion parameter models on edge devices remains a significant challenge due to memory constraints and latency requirements. In this paper, we propose EdgeNav-QE, a novel framework that integrates Quantized Low-Rank Adaptation (QLoRA) with a dynamic early-exit (DEE) mechanism to optimize LAMs for real-time edge navigation. By quantizing the backbone to 4-bit precision and strategically placing early-exit branches, we enable the model to terminate inference early for simple navigation tasks while retaining full depth for complex decision-making. Experimental results on the Habitat-Sim environment with Matterport3D dataset using OpenVLA-7B backbone, demonstrate that EdgeNav-QE reduces inference latency by 82.7% and memory footprint by 66.7% compared to full-precision baselines, while maintaining 81.8% navigation success rate. Furthermore, it outperforms state-of-the-art static early-exit method by 17.9% in latency, demonstrating the superiority of content-aware adaptive computation for safety-critical applications.
Paper Structure (28 sections, 11 equations, 3 figures, 6 tables, 1 algorithm)

This paper contains 28 sections, 11 equations, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: The Edge Bottleneck Paradox. An example of the resource and safety constraints. LAMs require 3.5× memory and 2.5× latency beyond edge limits, creating a 2.5m collision blind spot and 90% computational waste.
  • Figure 2: Overview of the EdgeNav-QE Framework for Edge Navigation. The workflow begins with local RGB-D perception on the edge device. The LAM backbone utilizes frozen 4-bit quantization for memory efficiency, adapted via LoRA. The Dynamic Early Exit mechanism (red path) acts as an adaptive gate: simple navigation states trigger a fast, early exit to minimize latency, while complex states traverse the full network depth to ensure safety. This allows the high-capacity LAM to operate within the tight latency constraints of real-time robotics.
  • Figure 3: ObjectGoal Navigation Task. An example of Habitat Challenge 2023 habitatchallenge2023.