Table of Contents
Fetching ...

Dirichlet-Based Prediction Calibration for Learning with Noisy Labels

Chen-Chen Zong, Ye-Wen Wang, Ming-Kun Xie, Sheng-Jun Huang

TL;DR

This work tackles the problem of learning with noisy labels by identifying softmax translation invariance as a primary source of over-confident, unreliable predictions. It introduces Dirichlet-Based Prediction Calibration (DPC), which calibrates the softmax output with a constant in the exponent and models predictions with a Dirichlet distribution, enabling a meaningful probabilistic interpretation and training via evidential deep learning. A large-margin example-selection criterion is developed to leverage the more distinct logits produced by calibration, and the approach is integrated with MixMatch-style semi-supervised learning using a two-head architecture. Across synthetic and real-world noisy datasets, DPC achieves state-of-the-art results, with notable gains on CIFAR-100 under symmetric noise and strong performance when combined with data augmentation, demonstrating the practical impact of calibrated predictions for noisy-label learning. The code is publicly available, facilitating adoption and further research.

Abstract

Learning with noisy labels can significantly hinder the generalization performance of deep neural networks (DNNs). Existing approaches address this issue through loss correction or example selection methods. However, these methods often rely on the model's predictions obtained from the softmax function, which can be over-confident and unreliable. In this study, we identify the translation invariance of the softmax function as the underlying cause of this problem and propose the \textit{Dirichlet-based Prediction Calibration} (DPC) method as a solution. Our method introduces a calibrated softmax function that breaks the translation invariance by incorporating a suitable constant in the exponent term, enabling more reliable model predictions. To ensure stable model training, we leverage a Dirichlet distribution to assign probabilities to predicted labels and introduce a novel evidence deep learning (EDL) loss. The proposed loss function encourages positive and sufficiently large logits for the given label, while penalizing negative and small logits for other labels, leading to more distinct logits and facilitating better example selection based on a large-margin criterion. Through extensive experiments on diverse benchmark datasets, we demonstrate that DPC achieves state-of-the-art performance. The code is available at https://github.com/chenchenzong/DPC.

Dirichlet-Based Prediction Calibration for Learning with Noisy Labels

TL;DR

This work tackles the problem of learning with noisy labels by identifying softmax translation invariance as a primary source of over-confident, unreliable predictions. It introduces Dirichlet-Based Prediction Calibration (DPC), which calibrates the softmax output with a constant in the exponent and models predictions with a Dirichlet distribution, enabling a meaningful probabilistic interpretation and training via evidential deep learning. A large-margin example-selection criterion is developed to leverage the more distinct logits produced by calibration, and the approach is integrated with MixMatch-style semi-supervised learning using a two-head architecture. Across synthetic and real-world noisy datasets, DPC achieves state-of-the-art results, with notable gains on CIFAR-100 under symmetric noise and strong performance when combined with data augmentation, demonstrating the practical impact of calibrated predictions for noisy-label learning. The code is publicly available, facilitating adoption and further research.

Abstract

Learning with noisy labels can significantly hinder the generalization performance of deep neural networks (DNNs). Existing approaches address this issue through loss correction or example selection methods. However, these methods often rely on the model's predictions obtained from the softmax function, which can be over-confident and unreliable. In this study, we identify the translation invariance of the softmax function as the underlying cause of this problem and propose the \textit{Dirichlet-based Prediction Calibration} (DPC) method as a solution. Our method introduces a calibrated softmax function that breaks the translation invariance by incorporating a suitable constant in the exponent term, enabling more reliable model predictions. To ensure stable model training, we leverage a Dirichlet distribution to assign probabilities to predicted labels and introduce a novel evidence deep learning (EDL) loss. The proposed loss function encourages positive and sufficiently large logits for the given label, while penalizing negative and small logits for other labels, leading to more distinct logits and facilitating better example selection based on a large-margin criterion. Through extensive experiments on diverse benchmark datasets, we demonstrate that DPC achieves state-of-the-art performance. The code is available at https://github.com/chenchenzong/DPC.
Paper Structure (12 sections, 14 equations, 3 figures, 4 tables)

This paper contains 12 sections, 14 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: A specific case comparing the softmax-based model and our proposed calibrated Dirichlet-based model. The softmax function has translation invariance, i.e., can only reflect the relative relationship between logits, and gives the same prediction for $\boldsymbol{\mathrm {x}} _1$ and $\boldsymbol{\mathrm {x}} _2$, which contradicts our subjective intuition. We break the translation invariance by placing a suitable constant on the exponent term and proposing a corresponding Dirichlet-based training method.
  • Figure 2: Training on CIFAR-10 with a 50% symmetric noise rate. All the figures are plotted based on the results of the last epoch. (a) The Expected Calibration Error (ECE) results of the test data. A smaller ECE value is better, and correspondingly, a line closer to the dashed line is preferred. We can see that the softmax-based DivideMix tends to produce over-confident predictions. (b) The distribution of the maximum predicted probability for training examples. (c) The distribution of the given label logit for training examples. (d) The comparison of the example selection criterion. We can see that the proposed large-margin criterion can produce more discriminative results.
  • Figure 3: Ablation studies on CIFAR-10 and CIFAR-100 with a 50% symmetric noise rate, respectively. (a) and (b) are the ablation studies of $\beta$ on CIFAR-10 and CIFAR-100. (c) and (d) are the ablation studies of $\gamma$ on CIFAR-10 and CIFAR-100.