Contrastive Multiple Instance Learning for Weakly Supervised Person ReID
Jacob Tyo, Zachary C. Lipton
TL;DR
This work addresses the challenge of weakly supervised person re-identification by introducing Contrastive Multiple Instance Learning (CMIL), a framework that learns from bag-level labels without pseudo-labels. CMIL uses a ResNet-50 based per-image encoder and a permutation-invariant set-transformer aggregator to produce bag representations, optimized with a combination of triplet, identity, and an optional alignment loss as ${\mathcal{L}} = \alpha {\mathcal{L}_{triplet}} + \beta {\mathcal{L}_{CE}} + \gamma {\mathcal{L}_{align}}$. The authors release WL-MUDD, a real-world weakly labeled ReID dataset, and evaluate CMIL on WL-Market1501, WL-MUDD, and SYSU-30k, consistently achieving state-of-the-art or near‑state-of-the-art performance under weak supervision. Key findings include the empirical ineffectiveness of the alignment loss, and the strong and robust performance of simple aggregation like average pooling, suggesting practical viability for weakly labeled ReID tasks. The work contributes both a new dataset and a scalable, label-efficient method that narrows the gap to fully supervised ReID in real-world settings.
Abstract
The acquisition of large-scale, precisely labeled datasets for person re-identification (ReID) poses a significant challenge. Weakly supervised ReID has begun to address this issue, although its performance lags behind fully supervised methods. In response, we introduce Contrastive Multiple Instance Learning (CMIL), a novel framework tailored for more effective weakly supervised ReID. CMIL distinguishes itself by requiring only a single model and no pseudo labels while leveraging contrastive losses -- a technique that has significantly enhanced traditional ReID performance yet is absent in all prior MIL-based approaches. Through extensive experiments and analysis across three datasets, CMIL not only matches state-of-the-art performance on the large-scale SYSU-30k dataset with fewer assumptions but also consistently outperforms all baselines on the WL-market1501 and Weakly Labeled MUddy racer re-iDentification dataset (WL-MUDD) datasets. We introduce and release the WL-MUDD dataset, an extension of the MUDD dataset featuring naturally occurring weak labels from the real-world application at PerformancePhoto.co. All our code and data are accessible at https://drive.google.com/file/d/1rjMbWB6m-apHF3Wg_cfqc8QqKgQ21AsT/view?usp=drive_link.
