Table of Contents
Fetching ...

Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection

Zhipeng Yu, Qianqian Xu, Yangbangyan Jiang, Yingfei Sun, Qingming Huang

TL;DR

SGPS tackles noisy labels in deep metric learning by not discarding noisy samples, but rather enriching their supervision through subgroup-informed positive pairs and learned prototypes. The approach combines probability-based clean-sample selection (PCS), a Subgroup Generation Module (SGM) to uncover cleaner subgroup structures, and a Positive Prototype Generation Module (PPM) to form informative prototypes, optimized with a dual loss that jointly handles clean and noisy samples. Extensive experiments on synthetic and real-world noisy datasets demonstrate that SGPS consistently surpasses state-of-the-art noisy-label DML methods, offering improved sample utilization and scalability for tasks like image retrieval and face recognition. The method provides practical gains in robustness and can be readily integrated with existing DML losses and architectures.

Abstract

The existence of noisy labels in real-world data negatively impacts the performance of deep learning models. Although much research effort has been devoted to improving the robustness towards noisy labels in classification tasks, the problem of noisy labels in deep metric learning (DML) remains under-explored. Existing noisy label learning methods designed for DML mainly discard suspicious noisy samples, resulting in a waste of the training data. To address this issue, we propose a noise-robust DML framework with SubGroup-based Positive-pair Selection (SGPS), which constructs reliable positive pairs for noisy samples to enhance the sample utilization. Specifically, SGPS first effectively identifies clean and noisy samples by a probability-based clean sample selectionstrategy. To further utilize the remaining noisy samples, we discover their potential similar samples based on the subgroup information given by a subgroup generation module and then aggregate them into informative positive prototypes for each noisy sample via a positive prototype generation module. Afterward, a new contrastive loss is tailored for the noisy samples with their selected positive pairs. SGPS can be easily integrated into the training process of existing pair-wise DML tasks, like image retrieval and face recognition. Extensive experiments on multiple synthetic and real-world large-scale label noise datasets demonstrate the effectiveness of our proposed method. Without any bells and whistles, our SGPS framework outperforms the state-of-the-art noisy label DML methods. Code is available at \url{https://github.com/smuelpeng/SGPS-NoiseFreeDML}.

Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection

TL;DR

SGPS tackles noisy labels in deep metric learning by not discarding noisy samples, but rather enriching their supervision through subgroup-informed positive pairs and learned prototypes. The approach combines probability-based clean-sample selection (PCS), a Subgroup Generation Module (SGM) to uncover cleaner subgroup structures, and a Positive Prototype Generation Module (PPM) to form informative prototypes, optimized with a dual loss that jointly handles clean and noisy samples. Extensive experiments on synthetic and real-world noisy datasets demonstrate that SGPS consistently surpasses state-of-the-art noisy-label DML methods, offering improved sample utilization and scalability for tasks like image retrieval and face recognition. The method provides practical gains in robustness and can be readily integrated with existing DML losses and architectures.

Abstract

The existence of noisy labels in real-world data negatively impacts the performance of deep learning models. Although much research effort has been devoted to improving the robustness towards noisy labels in classification tasks, the problem of noisy labels in deep metric learning (DML) remains under-explored. Existing noisy label learning methods designed for DML mainly discard suspicious noisy samples, resulting in a waste of the training data. To address this issue, we propose a noise-robust DML framework with SubGroup-based Positive-pair Selection (SGPS), which constructs reliable positive pairs for noisy samples to enhance the sample utilization. Specifically, SGPS first effectively identifies clean and noisy samples by a probability-based clean sample selectionstrategy. To further utilize the remaining noisy samples, we discover their potential similar samples based on the subgroup information given by a subgroup generation module and then aggregate them into informative positive prototypes for each noisy sample via a positive prototype generation module. Afterward, a new contrastive loss is tailored for the noisy samples with their selected positive pairs. SGPS can be easily integrated into the training process of existing pair-wise DML tasks, like image retrieval and face recognition. Extensive experiments on multiple synthetic and real-world large-scale label noise datasets demonstrate the effectiveness of our proposed method. Without any bells and whistles, our SGPS framework outperforms the state-of-the-art noisy label DML methods. Code is available at \url{https://github.com/smuelpeng/SGPS-NoiseFreeDML}.
Paper Structure (19 sections, 15 equations, 10 figures, 6 tables, 4 algorithms)

This paper contains 19 sections, 15 equations, 10 figures, 6 tables, 4 algorithms.

Figures (10)

  • Figure 1: (a) Performance drop corresponding to the noise level for image classification and DML tasks (image retrieval and face recognition). (b) Precision@1 of the clean training subset and test set for different methods on CARS. (c) Precision@1 of PRISIM PRISM2021 and clustering-based denoising methods, i.e., Kmeans kmeans++ and agglomerative hierarchical clustering (AHC) nielsen2016hierarchical on CARS with 50% sysmmetric noise.
  • Figure 2: The framework of our proposed method. The input batch will first be fed into the feature extractor network to obtain the features. Then, inputs will be separated into a clean set and a noisy set by the PCS module. Samples in the clean set will be used to compute $L_{\text{clean}}$. Based on the subroup labels generated by SGM, samples in the noisy set also obtain corresponding positive pairs $\mathcal{P}_{i}$. PPM will aggregate $\mathcal{P}_{i}$ to generate positive prototype $\bm{r}_{i}$ to compute $L_{\text{noise}}$ with the noisy samples.
  • Figure 3: The workflow of SGM. SGM will maintain two kinds of subgroup labels for each sample in the dataset. The subgrouping module will update the subgroup labels conditionally once new features are added to the feature bank, then update the subgroup labels in an asynchronous manner. The subgroup labels will be used to select positive pairs for each noisy sample in a batch.
  • Figure 4: Visualization of TransProto PPM. We employ a 3-layer transformer-based model learns to aggregate features of positive samples into the learning prototype $\bm{r}^i$.
  • Figure 5: Ablation of SGM and PPM on CARS and SOP with 50% symmetric noise.
  • ...and 5 more figures