Table of Contents
Fetching ...

Prototype-Based Low Altitude UAV Semantic Segmentation

Da Zhang, Gao Junyu, Zhao Zhiyuan

Abstract

Semantic segmentation of low-altitude UAV imagery presents unique challenges due to extreme scale variations, complex object boundaries, and limited computational resources on edge devices. Existing transformer-based segmentation methods achieve remarkable performance but incur high computational overhead, while lightweight approaches struggle to capture fine-grained details in high-resolution aerial scenes. To address these limitations, we propose PBSeg, an efficient prototype-based segmentation framework tailored for UAV applications. PBSeg introduces a novel prototype-based cross-attention (PBCA) that exploits feature redundancy to reduce computational complexity while maintaining segmentation quality. The framework incorporates an efficient multi-scale feature extraction module that combines deformable convolutions (DConv) with context-aware modulation (CAM) to capture both local details and global semantics. Experiments on two challenging UAV datasets demonstrate the effectiveness of the proposed approach. PBSeg achieves 71.86\% mIoU on UAVid and 80.92\% mIoU on UDD6, establishing competitive performance while maintaining computational efficiency. Code is available at https://github.com/zhangda1018/PBSeg.

Prototype-Based Low Altitude UAV Semantic Segmentation

Abstract

Semantic segmentation of low-altitude UAV imagery presents unique challenges due to extreme scale variations, complex object boundaries, and limited computational resources on edge devices. Existing transformer-based segmentation methods achieve remarkable performance but incur high computational overhead, while lightweight approaches struggle to capture fine-grained details in high-resolution aerial scenes. To address these limitations, we propose PBSeg, an efficient prototype-based segmentation framework tailored for UAV applications. PBSeg introduces a novel prototype-based cross-attention (PBCA) that exploits feature redundancy to reduce computational complexity while maintaining segmentation quality. The framework incorporates an efficient multi-scale feature extraction module that combines deformable convolutions (DConv) with context-aware modulation (CAM) to capture both local details and global semantics. Experiments on two challenging UAV datasets demonstrate the effectiveness of the proposed approach. PBSeg achieves 71.86\% mIoU on UAVid and 80.92\% mIoU on UDD6, establishing competitive performance while maintaining computational efficiency. Code is available at https://github.com/zhangda1018/PBSeg.

Paper Structure

This paper contains 14 sections, 9 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Illustration of the main challenges in low-altitude UAV semantic segmentation. (a) Strong scale variation with small objects embedded in large-scale urban layouts. (b) Complex boundaries and crowded regions make small instances difficult to segment reliably. (c) Trade-off between segmentation accuracy and computational cost on UAVid.
  • Figure 2: Architecture of PBSeg. The backbone extracts features from the input image; the multi-scale decoder upsamples features to recover high-resolution representations; the transformer decoder takes as input a set of learnable object embeddings and the high-resolution features and produces refined queries for inference.
  • Figure 3: Scheme of the proposed prototype-based cross-attention.
  • Figure 4: Visualization comparisons on the UAVid dataset. Image represents the input images, and GT represents ground-truth.
  • Figure 5: Visualization comparisons on the UDD6 dataset. Image represents the input images, and GT represents ground-truth.