Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection
Zhipeng Yu, Qianqian Xu, Yangbangyan Jiang, Yingfei Sun, Qingming Huang
TL;DR
SGPS tackles noisy labels in deep metric learning by not discarding noisy samples, but rather enriching their supervision through subgroup-informed positive pairs and learned prototypes. The approach combines probability-based clean-sample selection (PCS), a Subgroup Generation Module (SGM) to uncover cleaner subgroup structures, and a Positive Prototype Generation Module (PPM) to form informative prototypes, optimized with a dual loss that jointly handles clean and noisy samples. Extensive experiments on synthetic and real-world noisy datasets demonstrate that SGPS consistently surpasses state-of-the-art noisy-label DML methods, offering improved sample utilization and scalability for tasks like image retrieval and face recognition. The method provides practical gains in robustness and can be readily integrated with existing DML losses and architectures.
Abstract
The existence of noisy labels in real-world data negatively impacts the performance of deep learning models. Although much research effort has been devoted to improving the robustness towards noisy labels in classification tasks, the problem of noisy labels in deep metric learning (DML) remains under-explored. Existing noisy label learning methods designed for DML mainly discard suspicious noisy samples, resulting in a waste of the training data. To address this issue, we propose a noise-robust DML framework with SubGroup-based Positive-pair Selection (SGPS), which constructs reliable positive pairs for noisy samples to enhance the sample utilization. Specifically, SGPS first effectively identifies clean and noisy samples by a probability-based clean sample selectionstrategy. To further utilize the remaining noisy samples, we discover their potential similar samples based on the subgroup information given by a subgroup generation module and then aggregate them into informative positive prototypes for each noisy sample via a positive prototype generation module. Afterward, a new contrastive loss is tailored for the noisy samples with their selected positive pairs. SGPS can be easily integrated into the training process of existing pair-wise DML tasks, like image retrieval and face recognition. Extensive experiments on multiple synthetic and real-world large-scale label noise datasets demonstrate the effectiveness of our proposed method. Without any bells and whistles, our SGPS framework outperforms the state-of-the-art noisy label DML methods. Code is available at \url{https://github.com/smuelpeng/SGPS-NoiseFreeDML}.
