Table of Contents
Fetching ...

Instance-dependent Label Distribution Estimation for Learning with Label Noise

Zehui Liao, Shishuai Hu, Yutong Xie, Yong Xia

TL;DR

The proposed Instance-dependent Label Distribution Estimation (ILDE) method is evaluated against several state-of-the-art methods on two synthetic and three real-world noisy datasets and indicates that the proposed ILDE method outperforms all competing methods.

Abstract

Noise transition matrix (NTM) estimation is a promising approach for learning with label noise. It can infer clean posterior probabilities, known as Label Distribution (LD), based on noisy ones and reduce the impact of noisy labels. However, this estimation is challenging, since the ground truth labels are not always available. Most existing methods estimate a global NTM using either correctly labeled samples (anchor points) or detected reliable samples (pseudo anchor points). These methods heavily rely on the existence of anchor points or the quality of pseudo ones, and the global NTM can hardly provide accurate label transition information for each sample, since the label noise in real applications is mostly instance-dependent. To address these challenges, we propose an Instance-dependent Label Distribution Estimation (ILDE) method to learn from noisy labels for image classification. The method's workflow has three major steps. First, we estimate each sample's noisy posterior probability, supervised by noisy labels. Second, since mislabeling probability closely correlates with inter-class correlation, we compute the inter-class correlation matrix to estimate the NTM, bypassing the need for (pseudo) anchor points. Moreover, for a precise approximation of the instance-dependent NTM, we calculate the inter-class correlation matrix using only mini-batch samples rather than the entire training dataset. Third, we transform the noisy posterior probability into instance-dependent LD by multiplying it with the estimated NTM, using the resulting LD for enhanced supervision to prevent DCNNs from memorizing noisy labels. The proposed ILDE method has been evaluated against several state-of-the-art methods on two synthetic and three real-world noisy datasets. Our results indicate that the proposed ILDE method outperforms all competing methods, no matter whether the noise is synthetic or real noise.

Instance-dependent Label Distribution Estimation for Learning with Label Noise

TL;DR

The proposed Instance-dependent Label Distribution Estimation (ILDE) method is evaluated against several state-of-the-art methods on two synthetic and three real-world noisy datasets and indicates that the proposed ILDE method outperforms all competing methods.

Abstract

Noise transition matrix (NTM) estimation is a promising approach for learning with label noise. It can infer clean posterior probabilities, known as Label Distribution (LD), based on noisy ones and reduce the impact of noisy labels. However, this estimation is challenging, since the ground truth labels are not always available. Most existing methods estimate a global NTM using either correctly labeled samples (anchor points) or detected reliable samples (pseudo anchor points). These methods heavily rely on the existence of anchor points or the quality of pseudo ones, and the global NTM can hardly provide accurate label transition information for each sample, since the label noise in real applications is mostly instance-dependent. To address these challenges, we propose an Instance-dependent Label Distribution Estimation (ILDE) method to learn from noisy labels for image classification. The method's workflow has three major steps. First, we estimate each sample's noisy posterior probability, supervised by noisy labels. Second, since mislabeling probability closely correlates with inter-class correlation, we compute the inter-class correlation matrix to estimate the NTM, bypassing the need for (pseudo) anchor points. Moreover, for a precise approximation of the instance-dependent NTM, we calculate the inter-class correlation matrix using only mini-batch samples rather than the entire training dataset. Third, we transform the noisy posterior probability into instance-dependent LD by multiplying it with the estimated NTM, using the resulting LD for enhanced supervision to prevent DCNNs from memorizing noisy labels. The proposed ILDE method has been evaluated against several state-of-the-art methods on two synthetic and three real-world noisy datasets. Our results indicate that the proposed ILDE method outperforms all competing methods, no matter whether the noise is synthetic or real noise.
Paper Structure (28 sections, 11 equations, 5 figures, 8 tables)

This paper contains 28 sections, 11 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Inter-class correlation matrix and three noise transition matrices derived from the 'Worst', 'Random1', 'Aggregated' noisy label sets of CIFAR-10N, with the three color bars on the right-side log-norm transformed for clarity. The noise transition matrices are correlated with the class correlation matrix, with the Pearson and Spearman’s rank correlation coefficients calculated and displayed, highlighting the relationship between class similarity and mislabeling tendencies.
  • Figure 2: The overview of our Instance-dependent Label Distribution Estimation (ILDE) method. The backbone is used for image classification. The Instance-dependent LD Estimation Block is used for producing the instance-dependent LD for each input image, and the noise transition matrix estimation is based on all samples in a mini-batch ($N_b$ is the batch size). The loss contains three terms, including the noisy supervision ($L_{NL}$), the instance-dependent LD supervision ($\mathcal{L}_{LD}$), and the diagonally-dominant constraint ($\mathcal{L}_M$). For the input image $\textbf{x}_i$, $\textbf{y}_i^{'}$ is the output of the backbone. $\textbf{y}_i^{*oh}$ means one-hot observed label of $\textbf{x}_i$. $\textbf{y}_i^{LD}$, $\hat{\textbf{y}}_i^{LD}$ and $\bar{\textbf{y}}_i^{(t)}$ are the estimated LD, the sharpened one and the ensembled one, respectively. $\bar{\textbf{y}}_i^{(t)}$ is calculated as the weighted summation of the historical target $\bar{\textbf{y}}_i^{(t-1)}$ and the sharpened LD $\hat{\textbf{y}}_i^{LD}$ at the $t$-th epoch.
  • Figure 3: Test Accuracy (%) of ILDE versus the value of (a) $\alpha$, (b) $\beta$, (c) $\tau$, and (d) $\delta$ on CIFAR-10 (asymmetric noise rate is 0.5). The mean and standard deviation of the five runs' performances are shown here.
  • Figure 4: Memory accuracy of classifiers trained on MNIST and CIFAR-10 datasets with symmetric noise ($noise\ rate \in [0.2, 0.4, 0.6, 0.8]$). Memory accuracy is calculated based on the predictions and noisy labels of training data.
  • Figure 5: Influence of diagonally dominant regularization $L_M$ on the accuracy of corrected labels when validating our ILDE on CIFAR10 with asymmetric noise and noise rate equals 0.5. Three sub-figures demonstrate (a) the impact of $L_M$ on the ratio of estimated noise transition matrices that are diagonally dominant over epochs, (b, c) the accuracy of model predictions $\textbf{y}^{'}$ and corrected labels $\textbf{y}^{LD}$ .