Table of Contents
Fetching ...

Task-Oriented Communications for Visual Navigation with Edge-Aerial Collaboration in Low Altitude Economy

Zhengru Fang, Zhenghao Liu, Jingjing Wang, Senkang Hu, Yu Guo, Yiqin Deng, Yuguang Fang

TL;DR

This work tackles GNSS-denied urban UAV localization under bandwidth constraints typical of the Low Altitude Economy. It introduces O-VIB, a task-oriented encoder that combines Automatic Relevance Determination (ARD) with an orthogonality constraint within the Variational Information Bottleneck framework to produce ultra-compact, informative latent representations from multi-view imagery. The approach jointly encodes five camera views, transmits a small Z_t to edge servers, and fuses edge-based inferences to compute UAV pose with a balance between rate and accuracy, demonstrated by a CARLA-derived dataset and real hardware experiments. Results show sub-10 m localization accuracy at throughput below 10 KB/s and dramatic latency reductions compared with conventional codecs, highlighting the practical viability of edge-aerial TOC for LAE. The work provides a reusable dataset and codebase to accelerate research in task-oriented aerial communications and edge-assisted visual navigation, with significant implications for scalable, bandwidth-efficient UAV operations in urban environments.

Abstract

To support the Low Altitude Economy (LAE), it is essential to achieve precise localization of unmanned aerial vehicles (UAVs) in urban areas where global positioning system (GPS) signals are unavailable. Vision-based methods offer a viable alternative but face severe bandwidth, memory and processing constraints on lightweight UAVs. Inspired by mammalian spatial cognition, we propose a task-oriented communication framework, where UAVs equipped with multi-camera systems extract compact multi-view features and offload localization tasks to edge servers. We introduce the Orthogonally-constrained Variational Information Bottleneck encoder (O-VIB), which incorporates automatic relevance determination (ARD) to prune non-informative features while enforcing orthogonality to minimize redundancy. This enables efficient and accurate localization with minimal transmission cost. Extensive evaluation on a dedicated LAE UAV dataset shows that O-VIB achieves high-precision localization under stringent bandwidth budgets. Code and dataset will be made publicly available at: github.com/fangzr/TOC-Edge-Aerial.

Task-Oriented Communications for Visual Navigation with Edge-Aerial Collaboration in Low Altitude Economy

TL;DR

This work tackles GNSS-denied urban UAV localization under bandwidth constraints typical of the Low Altitude Economy. It introduces O-VIB, a task-oriented encoder that combines Automatic Relevance Determination (ARD) with an orthogonality constraint within the Variational Information Bottleneck framework to produce ultra-compact, informative latent representations from multi-view imagery. The approach jointly encodes five camera views, transmits a small Z_t to edge servers, and fuses edge-based inferences to compute UAV pose with a balance between rate and accuracy, demonstrated by a CARLA-derived dataset and real hardware experiments. Results show sub-10 m localization accuracy at throughput below 10 KB/s and dramatic latency reductions compared with conventional codecs, highlighting the practical viability of edge-aerial TOC for LAE. The work provides a reusable dataset and codebase to accelerate research in task-oriented aerial communications and edge-assisted visual navigation, with significant implications for scalable, bandwidth-efficient UAV operations in urban environments.

Abstract

To support the Low Altitude Economy (LAE), it is essential to achieve precise localization of unmanned aerial vehicles (UAVs) in urban areas where global positioning system (GPS) signals are unavailable. Vision-based methods offer a viable alternative but face severe bandwidth, memory and processing constraints on lightweight UAVs. Inspired by mammalian spatial cognition, we propose a task-oriented communication framework, where UAVs equipped with multi-camera systems extract compact multi-view features and offload localization tasks to edge servers. We introduce the Orthogonally-constrained Variational Information Bottleneck encoder (O-VIB), which incorporates automatic relevance determination (ARD) to prune non-informative features while enforcing orthogonality to minimize redundancy. This enables efficient and accurate localization with minimal transmission cost. Extensive evaluation on a dedicated LAE UAV dataset shows that O-VIB achieves high-precision localization under stringent bandwidth budgets. Code and dataset will be made publicly available at: github.com/fangzr/TOC-Edge-Aerial.

Paper Structure

This paper contains 15 sections, 4 theorems, 22 equations, 8 figures.

Key Result

Lemma 1

Let $q_{\phi}(\mathbf z\mid\mathbf x)$ be any encoder and let $p(\mathbf z)$ be an arbitrary prior. Define the variational mutual information $I_{q_{\phi}}(\mathbf x;\mathbf z) :=\mathrm{KL}\!\bigl(q_{\phi}(\mathbf x,\mathbf z)\, \Vert\,q_{\phi}(\mathbf x)\,q_{\phi}(\mathbf z)\bigr)$ and the margina If $p(\mathbf z)$ is chosen coordinate-wise log-uniform ($p(z_i)\propto|z_i|^{-1}$) and $q_{\phi}$

Figures (8)

  • Figure 1: The system model of Edge-aerial collaboration.
  • Figure 2: Feature-extraction and task-oriented compression pipeline executed on board the UAV.
  • Figure 3: Edge-side decoding and position-prediction pipeline running on RSU servers.
  • Figure 4: Multi-camera UAV perception system and corresponding visual observations.
  • Figure 5: Edge-enhanced UAV platform with integrated multiview perception and computing modules.
  • ...and 3 more figures

Theorems & Definitions (7)

  • Lemma 1
  • Proof 1
  • Lemma 2
  • Proof 2
  • Theorem 1
  • Proposition 1
  • Proof 1