Table of Contents
Fetching ...

Camera-aware Label Refinement for Unsupervised Person Re-identification

Pengna Li, Kangyi Wu, Wenli Huang, Sanping Zhou, Jinjun Wang

TL;DR

This work tackles unsupervised person re-identification under cross-camera distribution shifts and pseudo-label noise. It introduces Camera-Aware Label Refinement (CALR), combining intra-camera clustering to obtain reliable local pseudo labels, a pivot-based, self-paced inter-camera label refinement, and a camera-domain alignment module via a gradient reversal layer to reduce camera-induced feature distribution gaps. The method leverages two-stage training with cluster memories and a refined inter-camera contrastive objective, yielding substantial gains over both purely unsupervised and UDA baselines across Market-1501, DukeMTMC-ReID, MSMT17, Veri-776, and a self-collected dataset. The results demonstrate CALR’s effectiveness in producing accurate, camera-invariant representations that improve cross-camera Re-ID performance in realistic settings.

Abstract

Unsupervised person re-identification aims to retrieve images of a specified person without identity labels. Many recent unsupervised Re-ID approaches adopt clustering-based methods to measure cross-camera feature similarity to roughly divide images into clusters. They ignore the feature distribution discrepancy induced by camera domain gap, resulting in the unavoidable performance degradation. Camera information is usually available, and the feature distribution in the single camera usually focuses more on the appearance of the individual and has less intra-identity variance. Inspired by the observation, we introduce a \textbf{C}amera-\textbf{A}ware \textbf{L}abel \textbf{R}efinement~(CALR) framework that reduces camera discrepancy by clustering intra-camera similarity. Specifically, we employ intra-camera training to obtain reliable local pseudo labels within each camera, and then refine global labels generated by inter-camera clustering and train the discriminative model using more reliable global pseudo labels in a self-paced manner. Meanwhile, we develop a camera-alignment module to align feature distributions under different cameras, which could help deal with the camera variance further. Extensive experiments validate the superiority of our proposed method over state-of-the-art approaches. The code is accessible at https://github.com/leeBooMla/CALR.

Camera-aware Label Refinement for Unsupervised Person Re-identification

TL;DR

This work tackles unsupervised person re-identification under cross-camera distribution shifts and pseudo-label noise. It introduces Camera-Aware Label Refinement (CALR), combining intra-camera clustering to obtain reliable local pseudo labels, a pivot-based, self-paced inter-camera label refinement, and a camera-domain alignment module via a gradient reversal layer to reduce camera-induced feature distribution gaps. The method leverages two-stage training with cluster memories and a refined inter-camera contrastive objective, yielding substantial gains over both purely unsupervised and UDA baselines across Market-1501, DukeMTMC-ReID, MSMT17, Veri-776, and a self-collected dataset. The results demonstrate CALR’s effectiveness in producing accurate, camera-invariant representations that improve cross-camera Re-ID performance in realistic settings.

Abstract

Unsupervised person re-identification aims to retrieve images of a specified person without identity labels. Many recent unsupervised Re-ID approaches adopt clustering-based methods to measure cross-camera feature similarity to roughly divide images into clusters. They ignore the feature distribution discrepancy induced by camera domain gap, resulting in the unavoidable performance degradation. Camera information is usually available, and the feature distribution in the single camera usually focuses more on the appearance of the individual and has less intra-identity variance. Inspired by the observation, we introduce a \textbf{C}amera-\textbf{A}ware \textbf{L}abel \textbf{R}efinement~(CALR) framework that reduces camera discrepancy by clustering intra-camera similarity. Specifically, we employ intra-camera training to obtain reliable local pseudo labels within each camera, and then refine global labels generated by inter-camera clustering and train the discriminative model using more reliable global pseudo labels in a self-paced manner. Meanwhile, we develop a camera-alignment module to align feature distributions under different cameras, which could help deal with the camera variance further. Extensive experiments validate the superiority of our proposed method over state-of-the-art approaches. The code is accessible at https://github.com/leeBooMla/CALR.
Paper Structure (18 sections, 9 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 18 sections, 9 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: We illustrate the T-SNE visualization van2008visualizing in (a) for the feature distribution on Market-1501 zheng2015scalable, where features are extracted using ResNet-50 pre-trained on ImageNetdeng2009imagenet. Each color indicates samples from different cameras. Feature distributions are highly biased towards camera labels. Consequently, positive pairs captured from different cameras may exhibit greater dissimilarity than negative samples from the same camera, resulting in what we refer to as "IDs Merge" as shown in (b). To address this issue, we exploit more fine-grained and reliable local labels generated in advance to refine global clusters.
  • Figure 2: The overview of our proposed CALR. The intra-camera training stage optimizes each camera-specific CNN with local clustering and saves the final clustering results. The inter-camera training performs global clustering for all samples. Label refinement procedure exploits the reliable local cluster to estimate the pair relationship. The refined clusters are utilized to compute the inter-camera contrastive loss. We also perform camera domain classification on each feature embedding through the domain classifier and compute the domain classification loss.
  • Figure 3: Visualization of global cluster, local cluster, and refined cluster. Given a pivot, the first row denotes its global cluster and the second row denotes its local cluster. For the global cluster sample under the same camera with the pivot, we discard some samples which aren't clustered into the local cluster. The refined cluster is illustrated in the third row. Samples with red boxes are discarded, while those with green boxes do not.
  • Figure 4: The examples of images and challenge of the self-collected real-world dataset.
  • Figure 5: Comparison of clustering quality in the global clusters, the local clusters, and the refined clusters. We utilize precision, recall, f-score, and expansion metrics to analyze the clusters. Expansion refers to the average number of clusters to which an ID is classified. The global clusters are obtained from inter-camera clustering on Market1501. The local clusters are the final clustering results of the intra-camera training stage.
  • ...and 3 more figures