Table of Contents
Fetching ...

An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification

Lei Zhang, Xiaowei Fu, Fuxiang Huang, Yi Yang, Xinbo Gao

TL;DR

This work tackles the challenge of robust person ReID in open-world, cross-spatial-temporal, dynamic wild scenarios by introducing the OWD benchmark, which features 21 open-world scenes, 84 cameras, all-season daytime and nighttime data, and privacy-masked faces. To improve generalization, it proposes Latent Domain Expansion (LDE), a two-step method that first decouples identity-relevant and domain-relevant features via a dual-stream network with Domain Decouple Modules and Mutual Similarity Lifting-Suppression, then expands the latent domain space by adding Gaussian domain directions sampled from domain-wise covariances, optimized with an $L_{\\infty}$-cross-entropy loss together with a Triplet loss: $\\mathcal{L}_{All} = \\mathcal{L}_{Tri} + \\mathcal{L}_{CE}^{K \\to \\infty}$. Empirically, OWD demonstrates strong transferability across real-world, web, and synthetic data, and LDE achieves competitive or superior results on both small and large target domains, with visualization evidence showing clearer identity-focused features and cleaner domain cues after decoupling and expansion. The combination of a challenging open-world benchmark and a principled DG method provides a practical path toward scalable ReID in open-world deployments, with potential extensions to temporal information and clothing-invariant learning. The work also releases the OWD benchmark and LDE code to facilitate community progress toward robust open-world ReID.

Abstract

Person re-identification (ReID) has made great strides thanks to the data-driven deep learning techniques. However, the existing benchmark datasets lack diversity, and models trained on these data cannot generalize well to dynamic wild scenarios. To meet the goal of improving the explicit generalization of ReID models, we develop a new Open-World, Diverse, Cross-Spatial-Temporal dataset named OWD with several distinct features. 1) Diverse collection scenes: multiple independent open-world and highly dynamic collecting scenes, including streets, intersections, shopping malls, etc. 2) Diverse lighting variations: long time spans from daytime to nighttime with abundant illumination changes. 3) Diverse person status: multiple camera networks in all seasons with normal/adverse weather conditions and diverse pedestrian appearances (e.g., clothes, personal belongings, poses, etc.). 4) Protected privacy: invisible faces for privacy critical applications. To improve the implicit generalization of ReID, we further propose a Latent Domain Expansion (LDE) method to develop the potential of source data, which decouples discriminative identity-relevant and trustworthy domain-relevant features and implicitly enforces domain-randomized identity feature space expansion with richer domain diversity to facilitate domain invariant representations. Our comprehensive evaluations with most benchmark datasets in the community are crucial for progress, although this work is far from the grand goal toward open-world and dynamic wild applications.

An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification

TL;DR

This work tackles the challenge of robust person ReID in open-world, cross-spatial-temporal, dynamic wild scenarios by introducing the OWD benchmark, which features 21 open-world scenes, 84 cameras, all-season daytime and nighttime data, and privacy-masked faces. To improve generalization, it proposes Latent Domain Expansion (LDE), a two-step method that first decouples identity-relevant and domain-relevant features via a dual-stream network with Domain Decouple Modules and Mutual Similarity Lifting-Suppression, then expands the latent domain space by adding Gaussian domain directions sampled from domain-wise covariances, optimized with an -cross-entropy loss together with a Triplet loss: . Empirically, OWD demonstrates strong transferability across real-world, web, and synthetic data, and LDE achieves competitive or superior results on both small and large target domains, with visualization evidence showing clearer identity-focused features and cleaner domain cues after decoupling and expansion. The combination of a challenging open-world benchmark and a principled DG method provides a practical path toward scalable ReID in open-world deployments, with potential extensions to temporal information and clothing-invariant learning. The work also releases the OWD benchmark and LDE code to facilitate community progress toward robust open-world ReID.

Abstract

Person re-identification (ReID) has made great strides thanks to the data-driven deep learning techniques. However, the existing benchmark datasets lack diversity, and models trained on these data cannot generalize well to dynamic wild scenarios. To meet the goal of improving the explicit generalization of ReID models, we develop a new Open-World, Diverse, Cross-Spatial-Temporal dataset named OWD with several distinct features. 1) Diverse collection scenes: multiple independent open-world and highly dynamic collecting scenes, including streets, intersections, shopping malls, etc. 2) Diverse lighting variations: long time spans from daytime to nighttime with abundant illumination changes. 3) Diverse person status: multiple camera networks in all seasons with normal/adverse weather conditions and diverse pedestrian appearances (e.g., clothes, personal belongings, poses, etc.). 4) Protected privacy: invisible faces for privacy critical applications. To improve the implicit generalization of ReID, we further propose a Latent Domain Expansion (LDE) method to develop the potential of source data, which decouples discriminative identity-relevant and trustworthy domain-relevant features and implicitly enforces domain-randomized identity feature space expansion with richer domain diversity to facilitate domain invariant representations. Our comprehensive evaluations with most benchmark datasets in the community are crucial for progress, although this work is far from the grand goal toward open-world and dynamic wild applications.
Paper Structure (23 sections, 10 equations, 14 figures, 11 tables, 1 algorithm)

This paper contains 23 sections, 10 equations, 14 figures, 11 tables, 1 algorithm.

Figures (14)

  • Figure 1: Statistics of existing person ReID datasets. The digit above each circle is the number of data collection scenes, the sky-blue circles are real data and the grass-green circles represent synthetic data
  • Figure 2: Examples of the existing datasets from different sources (real-world data, web data and synthetic data). Compared to real-world benchmarks, OWD is more diverse and challenging. Web data is well-lit, clearer and finer, which limits its diversity. Large domain shift is observed between synthetic and real-world data. We mosaic the pedestrian faces obtained from real-world scenes without privacy issues
  • Figure 3: Examples from OWD with various scenes, camera views, lighting conditions and clothing changes, etc
  • Figure 4: Statistics of the collected OWD
  • Figure 5: Several collection scenarios of web data, synthetic data and our OWD. Web scenes are usually clear and refined owing to their broadcasting purpose. Moreover, due to a large number of close-up shots, pedestrian samples tend to have high resolution. There is a large domain gap between synthetic data and the real-world data. OWD is more diverse. To avoid privacy issue, mosaic is provided
  • ...and 9 more figures