Robust Semi-supervised Learning via $f$-Divergence and $α$-Rényi Divergence

Gholamali Aminian; Amirhossien Bagheri; Mahyar JafariNodeh; Radmehr Karimian; Mohammad-Hossein Yassaee

Robust Semi-supervised Learning via $f$-Divergence and $α$-Rényi Divergence

Gholamali Aminian, Amirhossien Bagheri, Mahyar JafariNodeh, Radmehr Karimian, Mohammad-Hossein Yassaee

TL;DR

This work develops a unified, divergence-guided framework for robust semi-supervised learning (SSL) by introducing divergence-based empirical risks (DER) grounded in $f$-divergences and $α$-Rényi divergences. It defines DER for both supervised and SSL settings, and extends it to SSL through pseudo-labeling and entropy-minimization with regularizers that encourage diverse, confident predictions while mitigating confirmation bias. The proposed DP-SSL and DEM-SSL algorithms implement uncertainty-aware pseudo-labeling and entropy-based regularization, demonstrating robustness to noisy pseudo-labels across datasets like CIFAR-100 and Letters. The results highlight that certain divergences (e.g., Jensen-Shannon) can offer superior robustness under label noise, and the framework provides theoretical upper bounds linking SSL costs to fully supervised risk, with potential for integration into popular SSL paradigms.

Abstract

This paper investigates a range of empirical risk functions and regularization methods suitable for self-training methods in semi-supervised learning. These approaches draw inspiration from various divergence measures, such as $f$-divergences and $α$-Rényi divergences. Inspired by the theoretical foundations rooted in divergences, i.e., $f$-divergences and $α$-Rényi divergence, we also provide valuable insights to enhance the understanding of our empirical risk functions and regularization techniques. In the pseudo-labeling and entropy minimization techniques as self-training methods for effective semi-supervised learning, the self-training process has some inherent mismatch between the true label and pseudo-label (noisy pseudo-labels) and some of our empirical risk functions are robust, concerning noisy pseudo-labels. Under some conditions, our empirical risk functions demonstrate better performance when compared to traditional self-training methods.

Robust Semi-supervised Learning via $f$-Divergence and $α$-Rényi Divergence

TL;DR

This work develops a unified, divergence-guided framework for robust semi-supervised learning (SSL) by introducing divergence-based empirical risks (DER) grounded in

-divergences and

-Rényi divergences. It defines DER for both supervised and SSL settings, and extends it to SSL through pseudo-labeling and entropy-minimization with regularizers that encourage diverse, confident predictions while mitigating confirmation bias. The proposed DP-SSL and DEM-SSL algorithms implement uncertainty-aware pseudo-labeling and entropy-based regularization, demonstrating robustness to noisy pseudo-labels across datasets like CIFAR-100 and Letters. The results highlight that certain divergences (e.g., Jensen-Shannon) can offer superior robustness under label noise, and the framework provides theoretical upper bounds linking SSL costs to fully supervised risk, with potential for integration into popular SSL paradigms.

Abstract

-divergences and

-Rényi divergences. Inspired by the theoretical foundations rooted in divergences, i.e.,

-divergences and

-Rényi divergence, we also provide valuable insights to enhance the understanding of our empirical risk functions and regularization techniques. In the pseudo-labeling and entropy minimization techniques as self-training methods for effective semi-supervised learning, the self-training process has some inherent mismatch between the true label and pseudo-label (noisy pseudo-labels) and some of our empirical risk functions are robust, concerning noisy pseudo-labels. Under some conditions, our empirical risk functions demonstrate better performance when compared to traditional self-training methods.

Paper Structure (20 sections, 3 theorems, 29 equations, 7 tables, 1 algorithm)

This paper contains 20 sections, 3 theorems, 29 equations, 7 tables, 1 algorithm.

Introduction
Preliminaries
Problem Formulation
Divergence and Entropy
Soft-label And Hard-label
Divergence-based empirical risk
DER For SL Application
DER For SSL Application
Pseudo-labeling
Entropy Minimization
Robustness
Algorithms
Experiments And Discussion
Conclusion And Future Works
Related Works
...and 5 more sections

Key Result

Theorem 1

Suppose that there exists an increasing function $G:[0,\infty)\to[0,\infty)$ where for a generator function, $f(t)$, $G(D_f(.\|.))$ is a metric on the space of probability distributions. Then, the following holds, where $\hat{R}_D^{\mathrm{FSL}}(\theta,\mathbf{Z}^{l},\mathbf{X}_m^u,\mathbf{Y}_t^u) \:= D_{f}\left( P_t\|P_{\theta^\star} \right),$ is the empirical risk of the FSL scenario, $P_t=P_t(

Theorems & Definitions (6)

Remark 1: Comparison with wei2020optimizing
Theorem 1
Proposition 1
Corollary 1
Remark 2: $\mathrm{TV}$-ERM
Remark 3: Comparison with aminian2022information and he2022information

Robust Semi-supervised Learning via $f$-Divergence and $α$-Rényi Divergence

TL;DR

Abstract

Robust Semi-supervised Learning via $f$-Divergence and $α$-Rényi Divergence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (6)