An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification
Lei Zhang, Xiaowei Fu, Fuxiang Huang, Yi Yang, Xinbo Gao
TL;DR
This work tackles the challenge of robust person ReID in open-world, cross-spatial-temporal, dynamic wild scenarios by introducing the OWD benchmark, which features 21 open-world scenes, 84 cameras, all-season daytime and nighttime data, and privacy-masked faces. To improve generalization, it proposes Latent Domain Expansion (LDE), a two-step method that first decouples identity-relevant and domain-relevant features via a dual-stream network with Domain Decouple Modules and Mutual Similarity Lifting-Suppression, then expands the latent domain space by adding Gaussian domain directions sampled from domain-wise covariances, optimized with an $L_{\\infty}$-cross-entropy loss together with a Triplet loss: $\\mathcal{L}_{All} = \\mathcal{L}_{Tri} + \\mathcal{L}_{CE}^{K \\to \\infty}$. Empirically, OWD demonstrates strong transferability across real-world, web, and synthetic data, and LDE achieves competitive or superior results on both small and large target domains, with visualization evidence showing clearer identity-focused features and cleaner domain cues after decoupling and expansion. The combination of a challenging open-world benchmark and a principled DG method provides a practical path toward scalable ReID in open-world deployments, with potential extensions to temporal information and clothing-invariant learning. The work also releases the OWD benchmark and LDE code to facilitate community progress toward robust open-world ReID.
Abstract
Person re-identification (ReID) has made great strides thanks to the data-driven deep learning techniques. However, the existing benchmark datasets lack diversity, and models trained on these data cannot generalize well to dynamic wild scenarios. To meet the goal of improving the explicit generalization of ReID models, we develop a new Open-World, Diverse, Cross-Spatial-Temporal dataset named OWD with several distinct features. 1) Diverse collection scenes: multiple independent open-world and highly dynamic collecting scenes, including streets, intersections, shopping malls, etc. 2) Diverse lighting variations: long time spans from daytime to nighttime with abundant illumination changes. 3) Diverse person status: multiple camera networks in all seasons with normal/adverse weather conditions and diverse pedestrian appearances (e.g., clothes, personal belongings, poses, etc.). 4) Protected privacy: invisible faces for privacy critical applications. To improve the implicit generalization of ReID, we further propose a Latent Domain Expansion (LDE) method to develop the potential of source data, which decouples discriminative identity-relevant and trustworthy domain-relevant features and implicitly enforces domain-randomized identity feature space expansion with richer domain diversity to facilitate domain invariant representations. Our comprehensive evaluations with most benchmark datasets in the community are crucial for progress, although this work is far from the grand goal toward open-world and dynamic wild applications.
