Robust Noisy Label Learning via Two-Stream Sample Distillation

Sihan Bai; Sanping Zhou; Zheng Qin; Le Wang; Nanning Zheng

Robust Noisy Label Learning via Two-Stream Sample Distillation

Sihan Bai, Sanping Zhou, Zheng Qin, Le Wang, Nanning Zheng

TL;DR

The paper tackles noisy-label learning by introducing Two-Stream Sample Distillation (TSSD), a framework that combines loss-space priors and feature-space structure to robustly select high-quality training samples from noisy data. It comprises two modules: Parallel Sample Division (PSD), which partitions data using dual-space cues and Gaussian Mixture Models to form a certain and an uncertain set, and Meta Sample Purification (MSP), which trains a meta-classifier on golden data to refine semi-hard samples from the uncertain set. The semi-supervised learning stage then treats the certain set as labeled and the uncertain set as unlabeled, using refined labels and MixMatch-like losses to train a robust model, with loss term L = L_C + \lambda_u L_U + \lambda_r L_reg. Empirical results on CIFAR-10/100, Tiny-ImageNet, and Clothing-1M show state-of-the-art or competitive performance across noise settings, demonstrating the effectiveness of jointly leveraging loss and feature information for sample distillation in noisy-label scenarios. Limitations include reliance on two particular feature-loss spaces, suggesting avenues for incorporating additional selection criteria in future work.

Abstract

Noisy label learning aims to learn robust networks under the supervision of noisy labels, which plays a critical role in deep learning. Existing work either conducts sample selection or label correction to deal with noisy labels during the model training process. In this paper, we design a simple yet effective sample selection framework, termed Two-Stream Sample Distillation (TSSD), for noisy label learning, which can extract more high-quality samples with clean labels to improve the robustness of network training. Firstly, a novel Parallel Sample Division (PSD) module is designed to generate a certain training set with sufficient reliable positive and negative samples by jointly considering the sample structure in feature space and the human prior in loss space. Secondly, a novel Meta Sample Purification (MSP) module is further designed to mine adequate semi-hard samples from the remaining uncertain training set by learning a strong meta classifier with extra golden data. As a result, more and more high-quality samples will be distilled from the noisy training set to train networks robustly in every iteration. Extensive experiments on four benchmark datasets, including CIFAR-10, CIFAR-100, Tiny-ImageNet, and Clothing-1M, show that our method has achieved state-of-the-art results over its competitors.

Robust Noisy Label Learning via Two-Stream Sample Distillation

TL;DR

Abstract

Paper Structure (13 sections, 19 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 13 sections, 19 equations, 9 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Proposed Method
Preliminaries
Parallel Sample Division
Meta Sample Purification
Semi-Supervised Learning
Experiments
Datasets
Training Details
Comparison with the State-of-the-Art Methods
Ablation studies
Conclusions

Figures (9)

Figure 1: Statistics on the number of inconsistent sample selection between loss space and feature space. Under different noise and different datasets, there are always inconsistent parts of the data filtered using the loss and feature method, reflecting the differences between the two methods. The experimental results are obtained by using the model trained in the first epoch after the warm-up training on the CIFAR-10 and CIFAR-100 datasets.
Figure 2: Illustration of PSD module, in which it divides the training samples into one certain set and another uncertain set based on the information in both feature space and loss space. Samples with green and red borders are considered clean label and noise label samples, respectively.
Figure 3: Framework of our Two-Stream Sample Distillation. First, the training samples are divided into different noisy clusters based on their given labels. Second, the backbone extracts feature from the samples for each cluster, which are then passed on to the subsequent modules: (1) The PSD module jointly considers the human prior in loss space and the sample structure in feature space, so as to generate the positive and negative sample in the centain set; (2) The MSP module trains a binary classifier with golden data to further identify additional semi-hard samples in the uncertain set. Third, we take samples in the certain set as labeled data and samples in the uncertain set as unlabeled data, after which an off-the-shelf semi-supervised learning algorithm is taken to train a robust network.
Figure 4: Causal diagram of data division in solving the noisy label learning problem. By selecting all samples with the same label as $k_i$ in the label set $\mathcal{\tilde{Y}}$, we are able to analyze the impact of labels $k_i$ and images $x_j$ on the network prediction results $f$ in terms of loss and feature respectively.
Figure 5: Architecture of our meta network, in which: (1) We take a simple two-layer MLP as the structure of our mapping function; (2) We take the positive and negative samples in the certain set as our meta data in the training process.
...and 4 more figures

Robust Noisy Label Learning via Two-Stream Sample Distillation

TL;DR

Abstract

Robust Noisy Label Learning via Two-Stream Sample Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)