PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation

Deyi Ji; Wenwei Jin; Hongtao Lu; Feng Zhao

PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation

Deyi Ji, Wenwei Jin, Hongtao Lu, Feng Zhao

TL;DR

This work tackles UAV image segmentation under dynamic viewpoints by introducing PPTFormer, a lightweight Transformer-based framework that learns pseudo multi-perspective representations without multi-perspective labels. It combines contourlet-based texture extraction, perspective prototypes, and an iterative PMP Attention mechanism to fuse original and pseudo perspectives, with Perspective Calibration to prevent domain shifts. The approach achieves state-of-the-art performance across five UAV datasets while maintaining efficiency, demonstrating the practicality of perspective-aware learning for robust UAV scene understanding. Overall, PPTFormer offers a scalable pathway to enhanced UAV segmentation by simulating diverse viewpoints and aligning multi-perspective features during training.

Abstract

The ascension of Unmanned Aerial Vehicles (UAVs) in various fields necessitates effective UAV image segmentation, which faces challenges due to the dynamic perspectives of UAV-captured images. Traditional segmentation algorithms falter as they cannot accurately mimic the complexity of UAV perspectives, and the cost of obtaining multi-perspective labeled datasets is prohibitive. To address these issues, we introduce the PPTFormer, a novel \textbf{P}seudo Multi-\textbf{P}erspective \textbf{T}rans\textbf{former} network that revolutionizes UAV image segmentation. Our approach circumvents the need for actual multi-perspective data by creating pseudo perspectives for enhanced multi-perspective learning. The PPTFormer network boasts Perspective Representation, novel Perspective Prototypes, and a specialized encoder and decoder that together achieve superior segmentation results through Pseudo Multi-Perspective Attention (PMP Attention) and fusion. Our experiments demonstrate that PPTFormer achieves state-of-the-art performance across five UAV segmentation datasets, confirming its capability to effectively simulate UAV flight perspectives and significantly advance segmentation precision. This work presents a pioneering leap in UAV scene understanding and sets a new benchmark for future developments in semantic segmentation.

PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation

TL;DR

Abstract

Paper Structure (35 sections, 11 equations, 5 figures, 4 tables)

This paper contains 35 sections, 11 equations, 5 figures, 4 tables.

Introduction
Related Work
Semantic Segmentation
UAV Scene Segmentation
Ultra-High Resolution Segmentation
PPTFormer
Overall Structure
PPTFormer Block
Efficient Perspective Generation
Perspective Representation
Texture Extraction.
Perspective Support Description.
Perspective Prototypes Construction
Pseudo Perspective Generation
Pseudo Multi-Perspective Attention
...and 20 more sections

Figures (5)

Figure 1: The overview of the proposed PPTFormer. The encoder comprises four Transformer Blocks: one Plain Transformer Block followed by three PPTFormer Blocks. The former is responsible for extracting basic low-level information, which serves as the foundation for pseudo multi-perspective learning in the subsequent blocks. Interleaved between the PPTFormer Blocks are Perspective Calibration modules, which prevent potential scene domain shifts. Finally, we concatenate features of varying scales produced by each block and feed them into the decoder network.
Figure 2: The Efficient Perspective Generation module in PPTFormer Block.
Figure 3: The detailed process of Perspective Representation.
Figure 4: Layer number in perspective calibration.
Figure 5: The quantity of perspective prototypes.

PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation

TL;DR

Abstract

PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)