Variational Rectification Inference for Learning with Noisy Labels

Haoliang Sun; Qi Wei; Lei Feng; Yupeng Hu; Fan Liu; Hehe Fan; Yilong Yin

Variational Rectification Inference for Learning with Noisy Labels

Haoliang Sun, Qi Wei, Lei Feng, Yupeng Hu, Fan Liu, Hehe Fan, Yilong Yin

Abstract

Label noise has been broadly observed in real-world datasets. To mitigate the negative impact of overfitting to label noise for deep models, effective strategies (\textit{e.g.}, re-weighting, or loss rectification) have been broadly applied in prevailing approaches, which have been generally learned under the meta-learning scenario. Despite the robustness of noise achieved by the probabilistic meta-learning models, they usually suffer from model collapse that degenerates generalization performance. In this paper, we propose variational rectification inference (VRI) to formulate the adaptive rectification for loss functions as an amortized variational inference problem and derive the evidence lower bound under the meta-learning framework. Specifically, VRI is constructed as a hierarchical Bayes by treating the rectifying vector as a latent variable, which can rectify the loss of the noisy sample with the extra randomness regularization and is, therefore, more robust to label noise. To achieve the inference of the rectifying vector, we approximate its conditional posterior with an amortization meta-network. By introducing the variational term in VRI, the conditional posterior is estimated accurately and avoids collapsing to a Dirac delta function, which can significantly improve the generalization performance. The elaborated meta-network and prior network adhere to the smoothness assumption, enabling the generation of reliable rectification vectors. Given a set of clean meta-data, VRI can be efficiently meta-learned within the bi-level optimization programming. Besides, theoretical analysis guarantees that the meta-network can be efficiently learned with our algorithm. Comprehensive comparison experiments and analyses validate its effectiveness for robust learning with noisy labels, particularly in the presence of open-set noise.

Variational Rectification Inference for Learning with Noisy Labels

Abstract

Paper Structure (19 sections, 2 theorems, 32 equations, 8 figures, 13 tables, 2 algorithms)

This paper contains 19 sections, 2 theorems, 32 equations, 8 figures, 13 tables, 2 algorithms.

Introduction
Related Work
Method
Preliminaries
Variational Rectification Inference
Meta-Learning Process
The practical objectives
Bi-level optimization
Convergence Analysis
Experiments
Setup
Comparison results
Further Analysis
Ablation Study
Learning without Meta-Data
...and 4 more sections

Key Result

Lemma 1

Suppose the loss function $L$w.r.t.$\theta$ in Eq. (eq:obj2) is $\ell$-smooth and $\tau$-Lipschitz, the KL term $D_{\mathrm{KL}}$w.r.t. the output of $V(\phi)$ has the $o$-bounded gradient, and $V(\phi)$ is differential with the $\delta$-bounded gradient and twice differential with its $\zeta$-bound

Figures (8)

Figure 1: The meta-network can generate the rectifying vector to integrate into the inference of the classification network. The variational module can avoid model collapse via a prior network.
Figure 2: Flowchart of the learning algorithm. The solid and dashed lines denote forward and backward propagation, respectively. For each iteration, the meta-network $\phi$ generates the distribution of $\mathbf{v}$ and then produces multiple examples via the sampling module to estimate the predictive distribution. By computing the gradient through the update step 4, the meta-network can be trained in step 5. The prior network is also jointly optimized in step 6. The classification network $\theta$ will be updated with support of the learned meta-network in step 7.
Figure 3: (a) As the noise ratio rises, the rectification effect becomes more obvious since the area of the original loss increases. (b) We almost achieve the unbiased estimation for the initialized transition matrix of flip noise with varying noise ratio $\rho$.
Figure 4: The performance as the noise ratio increases is compared. VRI continues to deliver good performance even at significantly higher noise ratios.
Figure 5: Our algorithm achieves a stable convergence and displays robustness on flip noise with a high ratio (e.g., $60\%$).
...and 3 more figures

Theorems & Definitions (6)

Lemma 1: Smoothness
proof
Theorem 1: Convergence Rate
proof
proof
proof

Variational Rectification Inference for Learning with Noisy Labels

Abstract

Variational Rectification Inference for Learning with Noisy Labels

Authors

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (6)