ProSub: Probabilistic Open-Set Semi-Supervised Learning with Subspace-Based Out-of-Distribution Detection
Erik Wallin, Lennart Svensson, Fredrik Kahl, Lars Hammarstrand
TL;DR
ProSub tackles open-set semi-supervised learning by introducing an ID subspace-based score, $s(oldsymbol{z})$, defined as the cosine of the angle between features and the ID subspace $W_ ext{id}$ spanned by class means. It then learns probabilistic predictions for ID/OOD by modeling $p_ ext{id}(s)$ and $p_ ext{ood}(s)$ as Beta distributions and estimating their parameters via a batch EM-like IMM procedure, enabling sampling-based ID/OOD decisions. A subspace loss $ ext{l}_{ ext{sub}}$ and a FixMatch-like pseudo-labeling scheme, augmented with cosine self-supervision, drive the model to separate ID from OOD while learning robust representations from all unlabeled data. Across multiple benchmarks, ProSub achieves state-of-the-art closed-set accuracy and AUROC for ID/OOD detection, demonstrating that probabilistic, subspace-based ID/OOD discrimination coupled with SSL signals yields strong OSSL performance; code is available at the provided repository.
Abstract
In open-set semi-supervised learning (OSSL), we consider unlabeled datasets that may contain unknown classes. Existing OSSL methods often use the softmax confidence for classifying data as in-distribution (ID) or out-of-distribution (OOD). Additionally, many works for OSSL rely on ad-hoc thresholds for ID/OOD classification, without considering the statistics of the problem. We propose a new score for ID/OOD classification based on angles in feature space between data and an ID subspace. Moreover, we propose an approach to estimate the conditional distributions of scores given ID or OOD data, enabling probabilistic predictions of data being ID or OOD. These components are put together in a framework for OSSL, termed ProSub, that is experimentally shown to reach SOTA performance on several benchmark problems. Our code is available at https://github.com/walline/prosub.
