Table of Contents
Fetching ...

Do not trust what you trust: Miscalibration in Semi-supervised Learning

Shambhavi Mishra, Balamurali Murugesan, Ismail Ben Ayed, Marco Pedersoli, Jose Dolz

TL;DR

This work empirically demonstrate that SSL methods based on pseudo-labels are significantly miscalibrated, and formally demonstrate the minimization of the min-entropy, a lower bound of the Shannon entropy, as a potential cause for miscalibration.

Abstract

State-of-the-art semi-supervised learning (SSL) approaches rely on highly confident predictions to serve as pseudo-labels that guide the training on unlabeled samples. An inherent drawback of this strategy stems from the quality of the uncertainty estimates, as pseudo-labels are filtered only based on their degree of uncertainty, regardless of the correctness of their predictions. Thus, assessing and enhancing the uncertainty of network predictions is of paramount importance in the pseudo-labeling process. In this work, we empirically demonstrate that SSL methods based on pseudo-labels are significantly miscalibrated, and formally demonstrate the minimization of the min-entropy, a lower bound of the Shannon entropy, as a potential cause for miscalibration. To alleviate this issue, we integrate a simple penalty term, which enforces the logit distances of the predictions on unlabeled samples to remain low, preventing the network predictions to become overconfident. Comprehensive experiments on a variety of SSL image classification benchmarks demonstrate that the proposed solution systematically improves the calibration performance of relevant SSL models, while also enhancing their discriminative power, being an appealing addition to tackle SSL tasks.

Do not trust what you trust: Miscalibration in Semi-supervised Learning

TL;DR

This work empirically demonstrate that SSL methods based on pseudo-labels are significantly miscalibrated, and formally demonstrate the minimization of the min-entropy, a lower bound of the Shannon entropy, as a potential cause for miscalibration.

Abstract

State-of-the-art semi-supervised learning (SSL) approaches rely on highly confident predictions to serve as pseudo-labels that guide the training on unlabeled samples. An inherent drawback of this strategy stems from the quality of the uncertainty estimates, as pseudo-labels are filtered only based on their degree of uncertainty, regardless of the correctness of their predictions. Thus, assessing and enhancing the uncertainty of network predictions is of paramount importance in the pseudo-labeling process. In this work, we empirically demonstrate that SSL methods based on pseudo-labels are significantly miscalibrated, and formally demonstrate the minimization of the min-entropy, a lower bound of the Shannon entropy, as a potential cause for miscalibration. To alleviate this issue, we integrate a simple penalty term, which enforces the logit distances of the predictions on unlabeled samples to remain low, preventing the network predictions to become overconfident. Comprehensive experiments on a variety of SSL image classification benchmarks demonstrate that the proposed solution systematically improves the calibration performance of relevant SSL models, while also enhancing their discriminative power, being an appealing addition to tackle SSL tasks.
Paper Structure (18 sections, 8 equations, 11 figures, 8 tables)

This paper contains 18 sections, 8 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Observation 1. Reliability plots for a baseline supervised model (trained with \ref{['eq:ce']}) and three representative SSL approaches (trained with \ref{['eq:start-eq']}) on CIFAR-100. These plots empirically highlight the calibration degradation observed when training with the standard unsupervised loss, despite the gains achieved in discrimination.
  • Figure 2: Observation 2.(Left)The unsupervised term in pseudo-label SSL is (approximately) equivalent to min-entropy, a lower bound of the Shannon Entropy. (Middle) Compared to the Shannon Entropy, the min-entropy is more aggressive in the gradient dynamics, particularly at the beginning of the training, when most predictions are uncertain. (Right) Ratio of samples with same hard prediction for weak and strong augmentations that were above the selection threshold of three relevant SSL methods.
  • Figure 3: Observation 3. These plots depict the Kernel Density Estimation of the logit distributions obtained by (left) the supervised baseline trained with \ref{['eq:ce']} and (right) FreeMatch on STL-10, for the samples belonging to class 5. We can observe that, even for non-target classes (k$\neq$5), the logit magnitudes in FreeMatch are larger, which translates to higher overconfidence in both correct and incorrect predictions. We select STL-10 due to its number of classes (10 vs. 100 in CIFAR-100).
  • Figure 4: Friedman Rank for the methods analyzed in Tables \ref{['table:tab-main']} and \ref{['table:tab-main-ECE']} following wang2022usb: a) MixMatch, b) Dash, c) AdaMatch, d) DeFixMatch, e) Fixmatch, f) FixMatch + Ours, g) FlexMatch, h) FlexMatch + Ours, i) FreeMatch and j) FreeMatch + Ours.
  • Figure 5: Impact of the proposed solution in the logits, which plots the Kernel density estimation of the logits distribution (per-class) for target class 5 for the supervised baseline (left), original FreeMatch (middle) and our version (right).
  • ...and 6 more figures