Table of Contents
Fetching ...

Fast One-Stage Unsupervised Domain Adaptive Person Search

Tianxiang Cui, Huibing Wang, Jinjia Peng, Ruoxi Deng, Xianping Fu, Yang Wang

TL;DR

FOUS tackles unsupervised domain adaptive person search by replacing costly clustering with a prototype-guided labeling strategy and by integrating an Attention-based Domain Alignment Module (ADAM) into a one-stage end-to-end framework. The approach jointly handles domain alignment for detection and ReID and progressively refines coarse labels through a label-flexible training regime, enabling efficient cross-domain generalization without target-domain annotations. Empirically, FOUS achieves state-of-the-art performance on CUHK-SYSU and PRW while substantially reducing computation compared with clustering-based methods, highlighting its practical value for real-world surveillance where labeled target data are unavailable. The work provides a scalable, faster alternative to multi-stage domain adaptation pipelines and releases the code for broader adoption.

Abstract

Unsupervised person search aims to localize a particular target person from a gallery set of scene images without annotations, which is extremely challenging due to the unexpected variations of the unlabeled domains. However, most existing methods dedicate to developing multi-stage models to adapt domain variations while using clustering for iterative model training, which inevitably increases model complexity. To address this issue, we propose a Fast One-stage Unsupervised person Search (FOUS) which complementary integrates domain adaptaion with label adaptaion within an end-to-end manner without iterative clustering. To minimize the domain discrepancy, FOUS introduced an Attention-based Domain Alignment Module (ADAM) which can not only align various domains for both detection and ReID tasks but also construct an attention mechanism to reduce the adverse impacts of low-quality candidates resulting from unsupervised detection. Moreover, to avoid the redundant iterative clustering mode, FOUS adopts a prototype-guided labeling method which minimizes redundant correlation computations for partial samples and assigns noisy coarse label groups efficiently. The coarse label groups will be continuously refined via label-flexible training network with an adaptive selection strategy. With the adapted domains and labels, FOUS can achieve the state-of-the-art (SOTA) performance on two benchmark datasets, CUHK-SYSU and PRW. The code is available at https://github.com/whbdmu/FOUS.

Fast One-Stage Unsupervised Domain Adaptive Person Search

TL;DR

FOUS tackles unsupervised domain adaptive person search by replacing costly clustering with a prototype-guided labeling strategy and by integrating an Attention-based Domain Alignment Module (ADAM) into a one-stage end-to-end framework. The approach jointly handles domain alignment for detection and ReID and progressively refines coarse labels through a label-flexible training regime, enabling efficient cross-domain generalization without target-domain annotations. Empirically, FOUS achieves state-of-the-art performance on CUHK-SYSU and PRW while substantially reducing computation compared with clustering-based methods, highlighting its practical value for real-world surveillance where labeled target data are unavailable. The work provides a scalable, faster alternative to multi-stage domain adaptation pipelines and releases the code for broader adoption.

Abstract

Unsupervised person search aims to localize a particular target person from a gallery set of scene images without annotations, which is extremely challenging due to the unexpected variations of the unlabeled domains. However, most existing methods dedicate to developing multi-stage models to adapt domain variations while using clustering for iterative model training, which inevitably increases model complexity. To address this issue, we propose a Fast One-stage Unsupervised person Search (FOUS) which complementary integrates domain adaptaion with label adaptaion within an end-to-end manner without iterative clustering. To minimize the domain discrepancy, FOUS introduced an Attention-based Domain Alignment Module (ADAM) which can not only align various domains for both detection and ReID tasks but also construct an attention mechanism to reduce the adverse impacts of low-quality candidates resulting from unsupervised detection. Moreover, to avoid the redundant iterative clustering mode, FOUS adopts a prototype-guided labeling method which minimizes redundant correlation computations for partial samples and assigns noisy coarse label groups efficiently. The coarse label groups will be continuously refined via label-flexible training network with an adaptive selection strategy. With the adapted domains and labels, FOUS can achieve the state-of-the-art (SOTA) performance on two benchmark datasets, CUHK-SYSU and PRW. The code is available at https://github.com/whbdmu/FOUS.
Paper Structure (12 sections, 18 equations, 5 figures, 5 tables)

This paper contains 12 sections, 18 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Differences in operation between our proposed method and mainstream methods. \ref{['f1a']} represents that the clustering algorithm utilizes the original data to calculate similarity in each iteration. \ref{['f1b']} illustrates that our proposed method utilizes the original data only in the first iteration to assign soft labels, and then gradually refines the labels.
  • Figure 2: The design architecture of the FOUS framework. For each iteration, after extracting features, FOUS alternates between two phases: (1) Attention-based Domain Alignment.(Sec. 2.2) The candidate box quality is enhanced through the Multi-Information Perception Attention Module, followed by domain alignment operations. (2) Unlabeled Target Domain Training.(Sec. 2.3) Annotate the unlabeled samples in the target domain utilizing prototypes as the reference for further fine-tuning, in which select random features as the target prototype vectors and update the learned source prototypes.
  • Figure 3: The channel-level information aggregation and spatial-level information aggregation in ADAM.
  • Figure 4: The channel-level information interaction and spatial-level information interaction in ADAM.
  • Figure 5: \ref{['f2a']} represents the influence of varying the number of random prototypes on the map and top-1 metrics on the CUHK-SYSU dataset. \ref{['f2b']} illustrates the impact of different pre-training rounds on two datasets.