Table of Contents
Fetching ...

Learning Confident Classifiers in the Presence of Label Noise

Asma Ahmed Hashmi, Aigerim Zhumabayeva, Nikita Kotelevskii, Artem Agafonov, Mohammad Yaqub, Maxim Panov, Martin Takáč

TL;DR

This work addresses learning under label noise when multiple annotators provide labels for each example, a common scenario in medical imaging. It introduces a probabilistic framework with a base predictor p_theta(y|x) and an annotator (confusion) network T^psi_r(x) to model per-annotator label noise, enabling recovery of ground-truth distributions via p^r_{theta,psi}(x) = T^psi_r(x) p_theta(x). A novel confidence-regularization objective is proposed to push the base predictor toward confident, near one-hot predictions, improving disentanglement of noise from true labels, with an additional loss for segmentation that leverages regions where all annotators agree (confident regions). Empirical results across synthetic and real-world datasets (MNIST, CIFAR-10, Fashion-MNIST, CIFAR-10N; LIDC; RIGA) show state-of-the-art or strong performance gains over several baselines, especially under heavy noise, and demonstrate improved Dice scores for segmentation with stable, robust behavior. The study provides practical impact for developing reliable classifiers and segmenters in settings with annotator disagreement, and it releases public code to facilitate further research and benchmarking.

Abstract

The success of Deep Neural Network (DNN) models significantly depends on the quality of provided annotations. In medical image segmentation, for example, having multiple expert annotations for each data point is common to minimize subjective annotation bias. Then, the goal of estimation is to filter out the label noise and recover the ground-truth masks, which are not explicitly given. This paper proposes a probabilistic model for noisy observations that allows us to build a confident classification and segmentation models. To accomplish it, we explicitly model label noise and introduce a new information-based regularization that pushes the network to recover the ground-truth labels. In addition, for segmentation task we adjust the loss function by prioritizing learning in high-confidence regions where all the annotators agree on labeling. We evaluate the proposed method on a series of classification tasks such as noisy versions of MNIST, CIFAR-10, Fashion-MNIST datasets as well as CIFAR-10N, which is real-world dataset with noisy human annotations. Additionally, for segmentation task, we consider several medical imaging datasets, such as, LIDC and RIGA that reflect real-world inter-variability among multiple annotators. Our experiments show that our algorithm outperforms state-of-the-art solutions for the considered classification and segmentation problems.

Learning Confident Classifiers in the Presence of Label Noise

TL;DR

This work addresses learning under label noise when multiple annotators provide labels for each example, a common scenario in medical imaging. It introduces a probabilistic framework with a base predictor p_theta(y|x) and an annotator (confusion) network T^psi_r(x) to model per-annotator label noise, enabling recovery of ground-truth distributions via p^r_{theta,psi}(x) = T^psi_r(x) p_theta(x). A novel confidence-regularization objective is proposed to push the base predictor toward confident, near one-hot predictions, improving disentanglement of noise from true labels, with an additional loss for segmentation that leverages regions where all annotators agree (confident regions). Empirical results across synthetic and real-world datasets (MNIST, CIFAR-10, Fashion-MNIST, CIFAR-10N; LIDC; RIGA) show state-of-the-art or strong performance gains over several baselines, especially under heavy noise, and demonstrate improved Dice scores for segmentation with stable, robust behavior. The study provides practical impact for developing reliable classifiers and segmenters in settings with annotator disagreement, and it releases public code to facilitate further research and benchmarking.

Abstract

The success of Deep Neural Network (DNN) models significantly depends on the quality of provided annotations. In medical image segmentation, for example, having multiple expert annotations for each data point is common to minimize subjective annotation bias. Then, the goal of estimation is to filter out the label noise and recover the ground-truth masks, which are not explicitly given. This paper proposes a probabilistic model for noisy observations that allows us to build a confident classification and segmentation models. To accomplish it, we explicitly model label noise and introduce a new information-based regularization that pushes the network to recover the ground-truth labels. In addition, for segmentation task we adjust the loss function by prioritizing learning in high-confidence regions where all the annotators agree on labeling. We evaluate the proposed method on a series of classification tasks such as noisy versions of MNIST, CIFAR-10, Fashion-MNIST datasets as well as CIFAR-10N, which is real-world dataset with noisy human annotations. Additionally, for segmentation task, we consider several medical imaging datasets, such as, LIDC and RIGA that reflect real-world inter-variability among multiple annotators. Our experiments show that our algorithm outperforms state-of-the-art solutions for the considered classification and segmentation problems.
Paper Structure (44 sections, 10 equations, 19 figures, 9 tables)

This paper contains 44 sections, 10 equations, 19 figures, 9 tables.

Figures (19)

  • Figure 1: Illustration of the proposed architecture. The input image is fed into two separate networks -- Base (in the particular case of this example -- Classifier Network) and Annotation. Then, their predictions are multiplied to produce the prediction of the noisy (annotators) labels distribution. Finally, the loss function is computed, summing up all the components.
  • Figure 2: Ground truth (left column) and predicted confusion matrices for different Annotators using different models: our approach with confidence regularizer ($\lambda=0.01$, m=2) (middle column) and without it ($\lambda=0$) (right column) on Curated MNIST.
  • Figure 3: Annotator information for three different styles (MNIST).
  • Figure 5: Illustration of the proposed architecture. The input image is fed into two separate networks -- Segmentation and Annotation. Then, their predictions are multiplied to produce the prediction of the noisy (annotators) segmentation masks. Finally, the loss function is computed, summing up all the components.
  • Figure 6: Visualization of the MNIST dataset input image with Gaussian noise, three annotations (thin, thick, fracture), and ground truth.
  • ...and 14 more figures