PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation
Deyi Ji, Wenwei Jin, Hongtao Lu, Feng Zhao
TL;DR
This work tackles UAV image segmentation under dynamic viewpoints by introducing PPTFormer, a lightweight Transformer-based framework that learns pseudo multi-perspective representations without multi-perspective labels. It combines contourlet-based texture extraction, perspective prototypes, and an iterative PMP Attention mechanism to fuse original and pseudo perspectives, with Perspective Calibration to prevent domain shifts. The approach achieves state-of-the-art performance across five UAV datasets while maintaining efficiency, demonstrating the practicality of perspective-aware learning for robust UAV scene understanding. Overall, PPTFormer offers a scalable pathway to enhanced UAV segmentation by simulating diverse viewpoints and aligning multi-perspective features during training.
Abstract
The ascension of Unmanned Aerial Vehicles (UAVs) in various fields necessitates effective UAV image segmentation, which faces challenges due to the dynamic perspectives of UAV-captured images. Traditional segmentation algorithms falter as they cannot accurately mimic the complexity of UAV perspectives, and the cost of obtaining multi-perspective labeled datasets is prohibitive. To address these issues, we introduce the PPTFormer, a novel \textbf{P}seudo Multi-\textbf{P}erspective \textbf{T}rans\textbf{former} network that revolutionizes UAV image segmentation. Our approach circumvents the need for actual multi-perspective data by creating pseudo perspectives for enhanced multi-perspective learning. The PPTFormer network boasts Perspective Representation, novel Perspective Prototypes, and a specialized encoder and decoder that together achieve superior segmentation results through Pseudo Multi-Perspective Attention (PMP Attention) and fusion. Our experiments demonstrate that PPTFormer achieves state-of-the-art performance across five UAV segmentation datasets, confirming its capability to effectively simulate UAV flight perspectives and significantly advance segmentation precision. This work presents a pioneering leap in UAV scene understanding and sets a new benchmark for future developments in semantic segmentation.
