Semi-Supervised Learning guided by the Generalized Bayes Rule under Soft Revision
Stefan Dietrich, Julian Rodemann, Christoph Jansen
TL;DR
This work addresses robust pseudo-label selection in semi-supervised learning under epistemic uncertainty by adopting credal sets and a generalized Bayes framework. It introduces the Gamma-Maximin criterion with soft revision via $\alpha$-cuts to systematically hedge against prior misspecification, and implements a logistic-model pipeline using Laplace approximation, BFGS, and COBYLA for computation. The paper provides a formal optimization formulation and demonstrates through simulations and real data that the proposed method performs strongly when labeled data are scarce, often outperforming conventional baselines and PPP variants. The approach offers a principled, conservative mechanism to leverage unlabeled data in practical SSL settings, potentially improving robustness to modeling assumptions.
Abstract
We provide a theoretical and computational investigation of the Gamma-Maximin method with soft revision, which was recently proposed as a robust criterion for pseudo-label selection (PLS) in semi-supervised learning. Opposed to traditional methods for PLS we use credal sets of priors ("generalized Bayes") to represent the epistemic modeling uncertainty. These latter are then updated by the Gamma-Maximin method with soft revision. We eventually select pseudo-labeled data that are most likely in light of the least favorable distribution from the so updated credal set. We formalize the task of finding optimal pseudo-labeled data w.r.t. the Gamma-Maximin method with soft revision as an optimization problem. A concrete implementation for the class of logistic models then allows us to compare the predictive power of the method with competing approaches. It is observed that the Gamma-Maximin method with soft revision can achieve very promising results, especially when the proportion of labeled data is low.
