Table of Contents
Fetching ...

Variational Learning Induces Adaptive Label Smoothing

Sin-Han Yang, Zhedong Liu, Gian Maria Marconi, Mohammad Emtiyaz Khan

TL;DR

This work demonstrates that variational learning inherently produces adaptive label smoothing by shaping a per-example label-noise term through the posterior over parameters. By deriving exact noise forms in logistic regression, GLMs, and neural networks via IVON, the authors show that posterior expectations create instance-specific smoothing without manual adaptive strategies. Empirically, IVON outperforms traditional label smoothing and matches or exceeds the performance of tuned baselines like SAM across synthetic and real noisy datasets, including Clothing1M, while requiring no hyperparameter tuning. The results bridge Bayesian inference with label smoothing, offering a robust, scalable framework for handling mislabels, calibration, and distribution shifts in deep learning.

Abstract

We show that variational learning naturally induces an adaptive label smoothing where label noise is specialized for each example. Such label-smoothing is useful to handle examples with labeling errors and distribution shifts, but designing a good adaptivity strategy is not always easy. We propose to skip this step and simply use the natural adaptivity induced during the optimization of a variational objective. We show empirical results where a variational algorithm called IVON outperforms traditional label smoothing and yields adaptivity strategies similar to those of an existing approach. By connecting Bayesian methods to label smoothing, our work provides a new way to handle overconfident predictions.

Variational Learning Induces Adaptive Label Smoothing

TL;DR

This work demonstrates that variational learning inherently produces adaptive label smoothing by shaping a per-example label-noise term through the posterior over parameters. By deriving exact noise forms in logistic regression, GLMs, and neural networks via IVON, the authors show that posterior expectations create instance-specific smoothing without manual adaptive strategies. Empirically, IVON outperforms traditional label smoothing and matches or exceeds the performance of tuned baselines like SAM across synthetic and real noisy datasets, including Clothing1M, while requiring no hyperparameter tuning. The results bridge Bayesian inference with label smoothing, offering a robust, scalable framework for handling mislabels, calibration, and distribution shifts in deep learning.

Abstract

We show that variational learning naturally induces an adaptive label smoothing where label noise is specialized for each example. Such label-smoothing is useful to handle examples with labeling errors and distribution shifts, but designing a good adaptivity strategy is not always easy. We propose to skip this step and simply use the natural adaptivity induced during the optimization of a variational objective. We show empirical results where a variational algorithm called IVON outperforms traditional label smoothing and yields adaptivity strategies similar to those of an existing approach. By connecting Bayesian methods to label smoothing, our work provides a new way to handle overconfident predictions.

Paper Structure

This paper contains 24 sections, 2 theorems, 34 equations, 10 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

A gradient update $\boldsymbol{\theta}_{t+t} = \boldsymbol{\theta}_t - \rho_t \nabla_{\boldsymbol{\theta}_t} \mathcal{L}(\boldsymbol{\theta}_t)$ is equivalent to the gradient update in eq:gd where the label $y_i$ are replaced by $y_i +\epsilon_{i|t}$ with noise defined as

Figures (10)

  • Figure 1: Given a regular $6$ digit (top) and an atypical one (bottom), Label Smoothing (LS) assigns the same label noise to both (gray bars) while variational learning assigns higher noise to the atypical example (red bars). Adaptivity naturally arises due to the posterior.
  • Figure 2: We plot label noise magnitude $\epsilon_{i|t}$ from \ref{['eq:labelnoiseGD_sim']} by varying the mean $f_{i|t}$ of $q_t(f_i)$ while fixing its variance to 1. The noise is large around 0 (but not at 0) with large peaks on both sides.
  • Figure 3: Label noise assigned by IVON and LS in MNIST dataset. Examples are ordered according to IVON's noise, and highest and lowest noise examples are visualized. We see that high noise is assigned to atypical examples while low noise is assigned to regular ones.
  • Figure 4: Smoothed label comparison among IVON IVON, LS Szegedy_rethinking and Online Label Smoothing (OLS) zhang2021delving. IVON has a similar adaptive label smoothing effect as OLS. $\alpha$ is the smoothing rate defined in \ref{['equ: label smoothing def']}. Y-axis is in the log scale. We randomly pick 10 classes for CIFAR-100 due to image size limit.
  • Figure 5: Results on CIFAR-10 with symmetric noisy labels. Top: IVON outperforms Label Smoothing (LS) with different smoothing rates $\alpha$. Down: IVON has comparable results with SAM peak performances, while SAM is sensitive to the choice of perturbation $\rho$. Accuracy improvements are shown in blue. Results are reported over 5 random seeds.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2