Learning with Noisy Labels: Interconnection of Two Expectation-Maximizations

Heewon Kim; Hyun Sung Chang; Kiho Cho; Jaeyun Lee; Bohyung Han

Learning with Noisy Labels: Interconnection of Two Expectation-Maximizations

Heewon Kim, Hyun Sung Chang, Kiho Cho, Jaeyun Lee, Bohyung Han

TL;DR

This work tackles learning with noisy labels (LNL) by formulating a probabilistic objective with a structured data manifold and introducing LNL-flywheel, a two-EM framework with a main network for distinguishing clean vs corrupted data and an auxiliary network for refurbishing corrupted labels. The two EM cycles are interlinked, sharing a single objective and aided by a confidence regularizer to prevent collapse, enabling cooperative optimization and robust learning under diverse label-noise conditions. Empirical results on CIFAR-10/100, Tiny-ImageNet, Clothing1M, and WebVision show state-of-the-art or competitive performance across symmetric, asymmetric, and instance-dependent noise, with the practical advantage of single-model inference. The approach offers a scalable, memory-efficient alternative to ensembles, providing substantial robustness and improved data utilization through label refurbishment and pseudo-labeling strategies.

Abstract

Labor-intensive labeling becomes a bottleneck in developing computer vision algorithms based on deep learning. For this reason, dealing with imperfect labels has increasingly gained attention and has become an active field of study. We address learning with noisy labels (LNL) problem, which is formalized as a task of finding a structured manifold in the midst of noisy data. In this framework, we provide a proper objective function and an optimization algorithm based on two expectation-maximization (EM) cycles. The separate networks associated with the two EM cycles collaborate to optimize the objective function, where one model is for distinguishing clean labels from corrupted ones while the other is for refurbishing the corrupted labels. This approach results in a non-collapsing LNL-flywheel model in the end. Experiments show that our algorithm achieves state-of-the-art performance in multiple standard benchmarks with substantial margins under various types of label noise.

Learning with Noisy Labels: Interconnection of Two Expectation-Maximizations

TL;DR

Abstract

Paper Structure (24 sections, 1 theorem, 24 equations, 5 figures, 7 tables)

This paper contains 24 sections, 1 theorem, 24 equations, 5 figures, 7 tables.

Introduction
Related work
LNL-flywheel model
Overview
EM for main network
EM for auxiliary network
Experiments
Implementation details
Initialization
Data augmentation
Hyperparameters
Inference
Datasets and training details
CIFAR-10 and CIFAR-100
Tiny-ImageNet
...and 9 more sections

Key Result

Lemma 1

Suppose a neural classifier $g_\theta(x)$ which has enough capacity to realize an arbitrary function. We consider the following regularized loss function for training: If the data mixture distribution has the same form as our model, i.e., $p_{\rm data}(\hat{y}|x) = \gamma' p_\pi(\hat{y}|x)+(1-\gamma')\epsilon$ for some $\gamma'\in (0,1)$, and if the labels in the data are uniformly distributed, i

Figures (5)

Figure 1: An illustration of the proposed framework. Main EM and auxiliary EM are jointly connected with their distinct roles, which results in cooperative learning of LNL problem.
Figure 2: Training LNL-flywheel. LNL-flywheel starts with ① estimating clean probability of each datapoint (Eq. \ref{['eq:e-for-main']}) after some initialization (Sec. \ref{['sub:implementation']}). This value is transmitted to the auxiliary side and used for ② the true class probability estimation (Eq. \ref{['eq:qy-for-cleanish']}). The true class probability serves as a supervisory signal for the auxiliary network. In ③ training the auxiliary network, contrastive learning is used together with strong augmentation (Sec. \ref{['sub:implementation']}). After the parameters $f_\phi$ and $T_c$ are updated, the resampled dataset $\{(x_i,\hat{y}_i)\}_{i=1}^N$ is generated by $\hat{y}_i=\arg\max_y f_\phi(x_i)[y]$ and $\{\epsilon_i\}_{i=1}^N$ is calculated on the basis of $T_c$ (Eq. \ref{['eq:epsilon']}). Then, the main network is ④ trained using the resampled dataset. In training the main network, confidence regularizer is also added as a regularizer (Eq. \ref{['eq:resampled-loss']}). Finally, the parameters $g_\theta$ and $\gamma$ are updated, and the next cycle continues.
Figure 3: Visualization of label corruption matrices for synthetic (CIFAR-10) and real (Clothing1M) noises. Upper. Ground-truth, Lower. Estimated. Color scale is displayed at the top. In each figure, the $i$th column corresponds to the corruption probability of the $i$th source class, i.e., $T_{c}(i,\cdot)$. The ground-truth of Clothing1M is obtained by manual inspection over a small fraction of the entire dataset xiao2015learning.
Figure 4: t-SNE visualizations of network features of test images. Networks are trained for 300 epochs using (a) DivideMix (b) UNICON and (c) LNL-flywheel, on CIFAR-10 dataset with $90\%$ symmetric noise.
Figure 5: Comparisons of clean sample selection performance (left) and label refurbishment accuracy (right) with other methods over epochs, on CIFAR-10, 90% symmetric noise.

Theorems & Definitions (2)

Lemma 1
proof

Learning with Noisy Labels: Interconnection of Two Expectation-Maximizations

TL;DR

Abstract

Learning with Noisy Labels: Interconnection of Two Expectation-Maximizations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (2)