Group Benefits Instances Selection for Data Purification

Zhenhuang Cai; Chuanyi Zhang; Dan Huang; Yuanbo Chen; Xiuyun Guan; Yazhou Yao

Group Benefits Instances Selection for Data Purification

Zhenhuang Cai, Chuanyi Zhang, Dan Huang, Yuanbo Chen, Xiuyun Guan, Yazhou Yao

TL;DR

This work proposes a method named GRIP to alleviate the noisy label problem for both synthetic and real-world datasets that integrates the advantages of noise-robust and noise-cleaning methods and remarkably alleviates the performance degradation caused by noisy labels.

Abstract

Manually annotating datasets for training deep models is very labor-intensive and time-consuming. To overcome such inferiority, directly leveraging web images to conduct training data becomes a natural choice. Nevertheless, the presence of label noise in web data usually degrades the model performance. Existing methods for combating label noise are typically designed and tested on synthetic noisy datasets. However, they tend to fail to achieve satisfying results on real-world noisy datasets. To this end, we propose a method named GRIP to alleviate the noisy label problem for both synthetic and real-world datasets. Specifically, GRIP utilizes a group regularization strategy that estimates class soft labels to improve noise robustness. Soft label supervision reduces overfitting on noisy labels and learns inter-class similarities to benefit classification. Furthermore, an instance purification operation globally identifies noisy labels by measuring the difference between each training sample and its class soft label. Through operations at both group and instance levels, our approach integrates the advantages of noise-robust and noise-cleaning methods and remarkably alleviates the performance degradation caused by noisy labels. Comprehensive experimental results on synthetic and real-world datasets demonstrate the superiority of GRIP over the existing state-of-the-art methods.

Group Benefits Instances Selection for Data Purification

TL;DR

Abstract

Paper Structure (33 sections, 12 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 33 sections, 12 equations, 9 figures, 6 tables, 1 algorithm.

Introduction
Related Works
Noise-Robust
Noise-Cleaning
The Proposed Method
Preliminary
Group Regularization
Instance Purification
Discussion
Comparison with OLS
Dynamic and Fixed Thresholds
Selection Criterion
Experiments on Synthetic Noisy Datasets
Datasets and Evaluation Metric
Implementation Details
...and 18 more sections

Figures (9)

Figure 1: Our approach (b) boosts the typical noise identification (a) through a group regularization strategy. Specifically, it utilizes the similarity between the predicted probability distribution of each sample and its class soft label to identify noisy labels. The predicted probability distributions of clean samples tend to be closer to class soft labels than that of noisy ones.
Figure 2: The framework of our proposed approach with Web-birdsun2021webly as an example. In each epoch $t$, the network produces a probability $p(x_{i})$ for each image $x_{i}$. Then $p(x_{i})$ updates the soft label of its class, and EMA is utilized to smooth the update. The estimated class soft labels $S$ are leveraged in the noise identification and provide supervision through $\mathcal{L}_{Soft}$. In noise identification, we compute the JS divergence $d_{i}$ between probability $p(x_{i})$ and soft label $S^{t-1}_{y_{i}}$ to select clean samples. As for noisy ones, we compute the JS divergence $\hat{d}_{i}$ between probability $p(x_{i})$ and soft label of its prediction $S^{t-1}_{\hat{y}_{i}}$ to divide revisable and discarded instances. The prediction $\hat{y}_{i}$ is assigned as the pseudo label for each revisable sample. Finally, clean and revisable images are trained using $\mathcal{L}_{GR}$. $\mathcal{L}_{ME}$ is applied on discarded ones as regularization.
Figure 3: Label distributions of $\mathcal{L}_{Soft}$ (a) and $\mathcal{L}_{Soft} + \mathcal{L}_{ME}$ (b) on Web-bird. We scale the $y$-axis using the log function for visualization. Soft labels are generated during the training process of a ResNet-18 model.
Figure 4: The distribution of $d$ in epoch $5$ (a) and $75$ (b) during the training process of a ResNet-18 model on Web-bird. The red bar indicates the threshold $thr$. As the training proceeds, $d_{i}$ decreases and $thr$ automatically adapts to the change of $d_{i}$. The distribution becomes more discrete because noisy samples are discarded.
Figure 5: The symmetric (a) and asymmetric (b) noise transition matrices and corresponding estimated soft labels after the warm-up period on CIFAR-10 ((c) and (d)). The noise ratios $\epsilon$ are set to $0.5$ and $0.4$ for symmetric and asymmetric noise, respectively.
...and 4 more figures

Group Benefits Instances Selection for Data Purification

TL;DR

Abstract

Group Benefits Instances Selection for Data Purification

Authors

TL;DR

Abstract

Table of Contents

Figures (9)