LATex: Leveraging Attribute-based Text Knowledge for Aerial-Ground Person Re-Identification

Pingping Zhang; Xiang Hu; Yuhao Wang; Huchuan Lu

LATex: Leveraging Attribute-based Text Knowledge for Aerial-Ground Person Re-Identification

Pingping Zhang, Xiang Hu, Yuhao Wang, Huchuan Lu

TL;DR

The paper tackles cross-view Aerial-Ground Person Re-Identification (AG-ReID) by leveraging stable human attributes as text-rich cues. It introduces LATex, a prompt-tuning framework that integrates an Attribute-aware Image Encoder (AIE), a Prompted Attribute Classifier Group (PACG), and a Coupled Prompt Template (CPT) to convert attribute information and view context into structured sentences processed by CLIP’s text encoder. This design enables explicit use of attribute-based textual knowledge, achieving strong performance across AG-ReID benchmarks while significantly reducing trainable parameters compared to full fine-tuning. The results demonstrate robustness to attribute-missing settings, efficiency gains, and clear qualitative insights into attribute-driven discrimination for cross-view person retrieval.

Abstract

As an important task in intelligent transportation systems, Aerial-Ground person Re-IDentification (AG-ReID) aims to retrieve specific persons across heterogeneous cameras in different viewpoints. Previous methods typically adopt deep learning-based models, focusing on extracting view-invariant features. However, they usually overlook the semantic information in person attributes. In addition, existing training strategies often rely on full fine-tuning large-scale models, which significantly increases training costs. To address these issues, we propose a novel framework named LATex for AG-ReID, which adopts prompt-tuning strategies to leverage attribute-based text knowledge. Specifically, with the Contrastive Language-Image Pre-training (CLIP) model, we first propose an Attribute-aware Image Encoder (AIE) to extract both global semantic features and attribute-aware features from input images. Then, with these features, we propose a Prompted Attribute Classifier Group (PACG) to predict person attributes and obtain attribute representations. Finally, we design a Coupled Prompt Template (CPT) to transform attribute representations and view information into structured sentences. These sentences are processed by the text encoder of CLIP to generate more discriminative features. As a result, our framework can fully leverage attribute-based text knowledge to improve AG-ReID performance. Extensive experiments on three AG-ReID benchmarks demonstrate the effectiveness of our proposed methods. The source code is available at https://github.com/kevinhu314/LATex.

LATex: Leveraging Attribute-based Text Knowledge for Aerial-Ground Person Re-Identification

TL;DR

Abstract

LATex: Leveraging Attribute-based Text Knowledge for Aerial-Ground Person Re-Identification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)