Table of Contents
Fetching ...

FedBPrompt: Federated Domain Generalization Person Re-Identification via Body Distribution Aware Visual Prompts

Xin Xu, Weilong Li, Wei Liu, Wenke Huang, Zhixi Yu, Bin Yang, Xiaoying Liao, Kui Jiang

Abstract

Federated Domain Generalization for Person Re-Identification (FedDG-ReID) learns domain-invariant representations from decentralized data. While Vision Transformer (ViT) is widely adopted, its global attention often fails to distinguish pedestrians from high similarity backgrounds or diverse viewpoints -- a challenge amplified by cross-client distribution shifts in FedDG-ReID. To address this, we propose Federated Body Distribution Aware Visual Prompt (FedBPrompt), introducing learnable visual prompts to guide Transformer attention toward pedestrian-centric regions. FedBPrompt employs a Body Distribution Aware Visual Prompts Mechanism (BAPM) comprising: Holistic Full Body Prompts to suppress cross-client background noise, and Body Part Alignment Prompts to capture fine-grained details robust to pose and viewpoint variations. To mitigate high communication costs, we design a Prompt-based Fine-Tuning Strategy (PFTS) that freezes the ViT backbone and updates only lightweight prompts, significantly reducing communication overhead while maintaining adaptability. Extensive experiments demonstrate that BAPM effectively enhances feature discrimination and cross-domain generalization, while PFTS achieves notable performance gains within only a few aggregation rounds. Moreover, both BAPM and PFTS can be easily integrated into existing ViT-based FedDG-ReID frameworks, making FedBPrompt a flexible and effective solution for federated person re-identification. The code is available at https://github.com/leavlong/FedBPrompt.

FedBPrompt: Federated Domain Generalization Person Re-Identification via Body Distribution Aware Visual Prompts

Abstract

Federated Domain Generalization for Person Re-Identification (FedDG-ReID) learns domain-invariant representations from decentralized data. While Vision Transformer (ViT) is widely adopted, its global attention often fails to distinguish pedestrians from high similarity backgrounds or diverse viewpoints -- a challenge amplified by cross-client distribution shifts in FedDG-ReID. To address this, we propose Federated Body Distribution Aware Visual Prompt (FedBPrompt), introducing learnable visual prompts to guide Transformer attention toward pedestrian-centric regions. FedBPrompt employs a Body Distribution Aware Visual Prompts Mechanism (BAPM) comprising: Holistic Full Body Prompts to suppress cross-client background noise, and Body Part Alignment Prompts to capture fine-grained details robust to pose and viewpoint variations. To mitigate high communication costs, we design a Prompt-based Fine-Tuning Strategy (PFTS) that freezes the ViT backbone and updates only lightweight prompts, significantly reducing communication overhead while maintaining adaptability. Extensive experiments demonstrate that BAPM effectively enhances feature discrimination and cross-domain generalization, while PFTS achieves notable performance gains within only a few aggregation rounds. Moreover, both BAPM and PFTS can be easily integrated into existing ViT-based FedDG-ReID frameworks, making FedBPrompt a flexible and effective solution for federated person re-identification. The code is available at https://github.com/leavlong/FedBPrompt.
Paper Structure (15 sections, 9 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 15 sections, 9 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: The Dual Challenges of Client Heterogeneity in Federated Person Re-ID. (a) In the FedDG-ReID setting, clients exhibit heterogeneous background and viewpoint distributions. (b) This data heterogeneity in turn leads to two critical failure modes: (Top) The model becomes easily distracted by dominant yet irrelevant backgrounds, causing false matches between different individuals. (Bottom) Diverse viewpoints severely misalign body parts of the same individual, which drastically reduces their feature similarity and results in mismatches.
  • Figure 2: An Overview of the Proposed FedBPrompt Framework. The framework consists of two main components. (Left) On each client, our FedBPrompt method injects learnable prompts to guide the model's attention toward pedestrian features and away from the background. The core BAPM then learns structured, part-level representations to solve the feature misalignment problem. (Right) The framework supports two training strategies. In Full-Parameter training, the entire model ( 86M) is communicated. In contrast, our proposed PFTS freezes the backbone and only communicates the lightweight prompts ( 0.46M), drastically reducing communication costs while maintaining high performance.
  • Figure 3: Attention maps on Market-1501 under severe cropping, misalignment, and occlusion. Unlike the scattered attention of the Baseline, our BAPM exhibits clear specialization: body part prompts ($\mathbf{P}^{\mathrm{upper}}$, etc.) localize specific regions, while holistic prompts ($\mathbf{P}^{\mathrm{full}}$) capture global features. This aggregation yields comprehensive and accurate BAPM attention.
  • Figure 4: T-SNE visualizations of feature distribution on CUHK02 (C2), CUHK03 (C3), and MSMT17 (MS).