Table of Contents
Fetching ...

Unsupervised Domain Adaptive Person Search via Dual Self-Calibration

Linfeng Qi, Huibing Wang, Jiqing Zhang, Jinjia Peng, Yang Wang

TL;DR

Unsupervised Domain Adaptive (UDA) person search suffers from noisy pseudo-labels driven by inter-domain gaps. The paper introduces Dual Self-Calibration (DSCA), combining a Perception-Driven Adaptive Filter (PDAF) for image-level purification and a Cluster Proxy Representation (CPR) for robust cluster updates at the instance level. PDAF uses a Perception-Driven Threshold and a Self-Calibrating Filter to filter foreground features, while CPR replaces instance-level memory with cluster proxies and updates them online and offline to suppress mislabel noise. Across CUHK-SYSU and PRW, DSCA achieves state-of-the-art performance among unsupervised methods and remains competitive with some supervised approaches, demonstrating strong robustness and efficiency for real-world domain adaptation.

Abstract

Unsupervised Domain Adaptive (UDA) person search focuses on employing the model trained on a labeled source domain dataset to a target domain dataset without any additional annotations. Most effective UDA person search methods typically utilize the ground truth of the source domain and pseudo-labels derived from clustering during the training process for domain adaptation. However, the performance of these approaches will be significantly restricted by the disrupting pseudo-labels resulting from inter-domain disparities. In this paper, we propose a Dual Self-Calibration (DSCA) framework for UDA person search that effectively eliminates the interference of noisy pseudo-labels by considering both the image-level and instance-level features perspectives. Specifically, we first present a simple yet effective Perception-Driven Adaptive Filter (PDAF) to adaptively predict a dynamic filter threshold based on input features. This threshold assists in eliminating noisy pseudo-boxes and other background interference, allowing our approach to focus on foreground targets and avoid indiscriminate domain adaptation. Besides, we further propose a Cluster Proxy Representation (CPR) module to enhance the update strategy of cluster representation, which mitigates the pollution of clusters from misidentified instances and effectively streamlines the training process for unlabeled target domains. With the above design, our method can achieve state-of-the-art (SOTA) performance on two benchmark datasets, with 80.2% mAP and 81.7% top-1 on the CUHK-SYSU dataset, with 39.9% mAP and 81.6% top-1 on the PRW dataset, which is comparable to or even exceeds the performance of some fully supervised methods. Our source code is available at https://github.com/whbdmu/DSCA.

Unsupervised Domain Adaptive Person Search via Dual Self-Calibration

TL;DR

Unsupervised Domain Adaptive (UDA) person search suffers from noisy pseudo-labels driven by inter-domain gaps. The paper introduces Dual Self-Calibration (DSCA), combining a Perception-Driven Adaptive Filter (PDAF) for image-level purification and a Cluster Proxy Representation (CPR) for robust cluster updates at the instance level. PDAF uses a Perception-Driven Threshold and a Self-Calibrating Filter to filter foreground features, while CPR replaces instance-level memory with cluster proxies and updates them online and offline to suppress mislabel noise. Across CUHK-SYSU and PRW, DSCA achieves state-of-the-art performance among unsupervised methods and remains competitive with some supervised approaches, demonstrating strong robustness and efficiency for real-world domain adaptation.

Abstract

Unsupervised Domain Adaptive (UDA) person search focuses on employing the model trained on a labeled source domain dataset to a target domain dataset without any additional annotations. Most effective UDA person search methods typically utilize the ground truth of the source domain and pseudo-labels derived from clustering during the training process for domain adaptation. However, the performance of these approaches will be significantly restricted by the disrupting pseudo-labels resulting from inter-domain disparities. In this paper, we propose a Dual Self-Calibration (DSCA) framework for UDA person search that effectively eliminates the interference of noisy pseudo-labels by considering both the image-level and instance-level features perspectives. Specifically, we first present a simple yet effective Perception-Driven Adaptive Filter (PDAF) to adaptively predict a dynamic filter threshold based on input features. This threshold assists in eliminating noisy pseudo-boxes and other background interference, allowing our approach to focus on foreground targets and avoid indiscriminate domain adaptation. Besides, we further propose a Cluster Proxy Representation (CPR) module to enhance the update strategy of cluster representation, which mitigates the pollution of clusters from misidentified instances and effectively streamlines the training process for unlabeled target domains. With the above design, our method can achieve state-of-the-art (SOTA) performance on two benchmark datasets, with 80.2% mAP and 81.7% top-1 on the CUHK-SYSU dataset, with 39.9% mAP and 81.6% top-1 on the PRW dataset, which is comparable to or even exceeds the performance of some fully supervised methods. Our source code is available at https://github.com/whbdmu/DSCA.

Paper Structure

This paper contains 19 sections, 10 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Noisy pseudo-labels consist of low-quality pseudo bounding boxes and misidentified pseudo identities. Where the quality of bounding boxes is the result of unsupervised detection, the reliability of the identities is affected by the clustering features.
  • Figure 2: The design architecture of the DSCA framework. For each training period, DSCA alternates between two phases:(1) Cluster Proxy Representation Pipeline. Using RPN generated proposal boxes to annotate unlabeled samples in the target domain, the annotated samples are clustered to assign pseudo-labels and initialize the cluster proxy dictionary. (2) Perception-Driven Adaptive Filter. Image-level features with valid foreground information after PDAF purification are employed for downstream domain alignment and person search tasks.
  • Figure 3: Comparison of our High-Order Soft Threshold with two classical threshold functions, take $n=2$ as an example.
  • Figure 4: Visualization of the detection scores and re-identification scores on the CUHK-SYSU dataset. Purple and green dots indicate the results of DAPS and ours.
  • Figure 5: Qualitative comparison of DSCA with DAPS on the CUHK-SYSU test set. The green bounding boxes denote the queries, while the red and orange bounding boxes denote incorrect and correct matches, respectively.
  • ...and 2 more figures