Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization

Yang Li; Ruichen Zhang; Yinqiu Liu; Guangyuan Liu; Dusit Niyato; Abbas Jamalipour; Xianbin Wang; Dong In Kim

Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization

Yang Li, Ruichen Zhang, Yinqiu Liu, Guangyuan Liu, Dusit Niyato, Abbas Jamalipour, Xianbin Wang, Dong In Kim

TL;DR

This work tackles the problem of delivering real-time Vision-Language Model (VLM) inference over UAV-enabled LAENets under tight resource constraints. It introduces a hierarchical optimization framework (ARPO-LLaRA) that jointly optimizes image resolution, uplink power, and UAV trajectory, leveraging an offline LLM-designed reward to guide DRL-based trajectory planning without adding real-time latency. ARPO solves the resolution and power subproblem via Branch-and-Bound and KKT, while LLaRA uses LLM-assisted reward design to improve PPO-based trajectory optimization, achieving faster convergence and better policies. Experimental results show substantial latency reductions and robust performance across multi-user, multi-batch scenarios, with resolution-aware trade-offs captured by empirical lookup tables and bandwidth/power sensitivity analyses, highlighting practical viability for onboard inference-as-a-service in LAENets.

Abstract

The rapid advancement of Low-Altitude Economy Networks (LAENets) has enabled a variety of applications, including aerial surveillance, environmental sensing, and semantic data collection. To support these scenarios, unmanned aerial vehicles (UAVs) equipped with onboard vision-language models (VLMs) offer a promising solution for real-time multimodal inference. However, ensuring both inference accuracy and communication efficiency remains a significant challenge due to limited onboard resources and dynamic network conditions. In this paper, we first propose a UAV-enabled LAENet system model that jointly captures UAV mobility, user-UAV communication, and the onboard visual question answering (VQA) pipeline. Based on this model, we formulate a mixed-integer non-convex optimization problem to minimize task latency and power consumption under user-specific accuracy constraints. To solve the problem, we design a hierarchical optimization framework composed of two parts: (i) an Alternating Resolution and Power Optimization (ARPO) algorithm for resource allocation under accuracy constraints, and (ii) a Large Language Model-augmented Reinforcement Learning Approach (LLaRA) for adaptive UAV trajectory optimization. The large language model (LLM) serves as an expert in refining reward design of reinforcement learning in an offline fashion, introducing no additional latency in real-time decision-making. Numerical results demonstrate the efficacy of our proposed framework in improving inference performance and communication efficiency under dynamic LAENet conditions.

Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization

TL;DR

Abstract

Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (2)