Table of Contents
Fetching ...

Dirichlet-based Per-Sample Weighting by Transition Matrix for Noisy Label Learning

HeeSun Bae, Seungjae Shin, Byeonghu Na, Il-Chul Moon

TL;DR

This work proposes good utilization of the transition matrix is crucial and suggests a new utilization method based on resampling, coined RENT, a REsampling method with Noise Transition matrix, which consistently outperforms existing transition matrix utilization methods, which includes reweighting, on various benchmark datasets.

Abstract

For learning with noisy labels, the transition matrix, which explicitly models the relation between noisy label distribution and clean label distribution, has been utilized to achieve the statistical consistency of either the classifier or the risk. Previous researches have focused more on how to estimate this transition matrix well, rather than how to utilize it. We propose good utilization of the transition matrix is crucial and suggest a new utilization method based on resampling, coined RENT. Specifically, we first demonstrate current utilizations can have potential limitations for implementation. As an extension to Reweighting, we suggest the Dirichlet distribution-based per-sample Weight Sampling (DWS) framework, and compare reweighting and resampling under DWS framework. With the analyses from DWS, we propose RENT, a REsampling method with Noise Transition matrix. Empirically, RENT consistently outperforms existing transition matrix utilization methods, which includes reweighting, on various benchmark datasets. Our code is available at \url{https://github.com/BaeHeeSun/RENT}.

Dirichlet-based Per-Sample Weighting by Transition Matrix for Noisy Label Learning

TL;DR

This work proposes good utilization of the transition matrix is crucial and suggests a new utilization method based on resampling, coined RENT, a REsampling method with Noise Transition matrix, which consistently outperforms existing transition matrix utilization methods, which includes reweighting, on various benchmark datasets.

Abstract

For learning with noisy labels, the transition matrix, which explicitly models the relation between noisy label distribution and clean label distribution, has been utilized to achieve the statistical consistency of either the classifier or the risk. Previous researches have focused more on how to estimate this transition matrix well, rather than how to utilize it. We propose good utilization of the transition matrix is crucial and suggest a new utilization method based on resampling, coined RENT. Specifically, we first demonstrate current utilizations can have potential limitations for implementation. As an extension to Reweighting, we suggest the Dirichlet distribution-based per-sample Weight Sampling (DWS) framework, and compare reweighting and resampling under DWS framework. With the analyses from DWS, we propose RENT, a REsampling method with Noise Transition matrix. Empirically, RENT consistently outperforms existing transition matrix utilization methods, which includes reweighting, on various benchmark datasets. Our code is available at \url{https://github.com/BaeHeeSun/RENT}.
Paper Structure (61 sections, 2 theorems, 16 equations, 20 figures, 15 tables, 2 algorithms)

This paper contains 61 sections, 2 theorems, 16 equations, 20 figures, 15 tables, 2 algorithms.

Key Result

Proposition 3.1

If $\boldsymbol{\mu}^*$ is accessible, $R_{l,\text{RENT}}^{emp}$ is statistically consistent to $R_l$ (Proof: Appendix appendix:rmk5).

Figures (20)

  • Figure 1: Dirichlet distribution-based per-sample Weight Sampling with shape parameter $\alpha$ and the mean vector $\boldsymbol{\mu}$. Image at the vertices of yellow triangles represents data instance. Blocks above the images represent true Class, noisy Label. Sides are implementation example of sampled $\boldsymbol{w}$. $\boldsymbol{w}^{(1)}$ assigns weights to all data (Reweighting), while $\boldsymbol{w}^{(2)}$ simulates resampling refined dataset (RENT).
  • Figure 2: Density plot of $\text{Dir}(\alpha\boldsymbol{\mu})$ with different $\alpha$. $\boldsymbol{\mu}$ is set as $\boldsymbol{[0.7,0.2, 0.1]}$ for this illustration. Star ($\star$) denotes the mean ($\boldsymbol{\mu}$). Note that this value is invariant to $\alpha$. Yellow denotes lower density, while it becomes denser progressively with violet.
  • Figure 3: Test accuracy with regard to various $\alpha$ for CIFAR10. (Star ($\star$) is RENT and Cross (x) means RW, respectively.)
  • Figure 4: Test accuracies over various $\sigma$ for CIFAR10. RW+$\epsilon$ denotes the integration of RW and the label perturbation technique.
  • Figure 5: Histogram of $w_i$, of RENT on CIFAR10. Cycle for $T$ estimation. Blue and orange represents samples with clean and noisy labels, respectively. Vertical dotted line denotes $1/B$.
  • ...and 15 more figures

Theorems & Definitions (4)

  • Proposition 3.1
  • Remark C.1
  • Proposition D.1
  • proof