ENTIRe-ID: An Extensive and Diverse Dataset for Person Re-Identification
Serdar Yildiz, Ahmet Nezih Kasim
TL;DR
The paper introduces ENTIRe-ID, a large-scale, diverse person re-identification dataset designed to address domain shift and generalization challenges in real-world settings. It details a scalable data-collection pipeline using YOLOv8 for cropping and ByteTrack for tracking, producing 4.45 million images of 13,540 identities from 37 cameras across four continents, with 20–30 images per person and manual cross-camera merging. Cross-dataset evaluations with a strong vision-transformer baseline reveal that ENTIRe-ID maintains robust performance across domains and spans a wide feature-space region when analyzed with CLIP and ImageNet-based representations. A privacy-preserving analysis shows facial blurring yields only a small drop in performance, underscoring a practical ethical design. Overall, ENTIRe-ID provides a realism-rich benchmark that can significantly improve generalization in person ReID research and applications.
Abstract
The growing importance of person reidentification in computer vision has highlighted the need for more extensive and diverse datasets. In response, we introduce the ENTIRe-ID dataset, an extensive collection comprising over 4.45 million images from 37 different cameras in varied environments. This dataset is uniquely designed to tackle the challenges of domain variability and model generalization, areas where existing datasets for person re-identification have fallen short. The ENTIRe-ID dataset stands out for its coverage of a wide array of real-world scenarios, encompassing various lighting conditions, angles of view, and diverse human activities. This design ensures a realistic and robust training platform for ReID models. The ENTIRe-ID dataset is publicly available at https://serdaryildiz.github.io/ENTIRe-ID
